research-article

MultiSense: Cross-labelling and Learning Human Activities Using Multimodal Sensing Data

Authors:

Xiang-Yang LiAuthors Info & Claims

ACM Transactions on Sensor Networks, Volume 19, Issue 3

Article No.: 65, Pages 1 - 26

https://doi.org/10.1145/3578267

Published: 17 April 2023 Publication History

Abstract

To tap into the gold mine of data generated by Internet of Things (IoT) devices with unprecedented volume and value, there is an urgent need to efficiently and accurately label raw sensor data. To this end, we explore and leverage the hidden connections among the multimodal data collected by various sensing devices and propose to let different modal data complement and learn from each other. But it is challenging to align and fuse multimodal data without knowing their perception (and thus the correct labels). In this work, we propose MultiSense, a paradigm for automatically mining potential perception, cross-labelling each modal data, and then updating the learning models for recognizing human activity to achieve higher accuracy or even recognize new activities. We design innovative solutions for segmenting, aligning, and fusing multimodal data from different sensors, as well as model updating mechanism. We implement our framework and conduct comprehensive evaluations on a rich set of data. Our results demonstrate that MultiSense significantly improves the data usability and the power of the learning models. With nine diverse activities performed by users, our framework automatically labels multimodal sensing data generated by five different sensing mechanisms (video, smart watch, smartphone, audio, and wireless-channel) with an average accuracy 98.5%. Furthermore, it enables models of some modalities to learn unknown activities from other modalities and greatly improves the activity recognition ability.

References

[1]

Rebecca Adaimi and Edison Thomaz. 2019. Leveraging active learning and conditional mutual information to minimize data annotation in human activity recognition. Proc. ACM Interact. Mobile Wear. Ubiq. Technol. 3, 3 (2019), 1–23.

Digital Library

[2]

Rosa Ma Alsina-Pagès, Joan Navarro, Francesc Alías, and Marcos Hervás. 2017. homesound: Real-time audio event detection based on high performance computing for behaviour and surveillance remote monitoring. Sensors 17, 4 (2017), 854.

[3]

Zhe Cao, Tomas Simon, Shih-En Wei, and Yaser Sheikh. 2017. Realtime multi-person 2d pose estimation using part affinity fields. In Proceedings of the IEEE/CVF Computer Vision and Pattern Recognition Conference (CVPR’17). 7291–7299.

[4]

Heng-Tze Cheng, Feng-Tso Sun, Martin Griss, Paul Davis, Jianguo Li, and Di You. 2013. Nuactiv: Recognizing unseen new activities using semantic attribute-based learning. In Proceeding of the 11th Annual International Conference on Mobile Systems, Applications, and Services. ACM, 361–374.

Digital Library

[5]

Changde Du, Changying Du, Hao Wang, Jinpeng Li, Wei-Long Zheng, Bao-Liang Lu, and Huiguang He. 2018. Semi-supervised deep generative modelling of incomplete multi-modality emotional data. In Proceedings of the 26th ACM international conference on Multimedia. 108–116.

Digital Library

[6]

Aymen Fakhreddine, Domenico Giustiniano, and Vincent Lenders. 2018. Data fusion for hybrid and autonomous time-of-flight positioning. In Proceedings of the 17th ACM/IEEE International Conference on Information Processing in Sensor Networks (IPSN’18). IEEE, 266–271.

Digital Library

[7]

Yasuhiro Fujiwara and Go Irie. 2014. Efficient label propagation. In International Conference on Machine Learning. 784–792.

[8]

Yue Gu, Xinyu Li, Kaixiang Huang, Shiyu Fu, Kangning Yang, Shuhong Chen, Moliang Zhou, and Ivan Marsic. 2018. Human conversation analysis using attentive multimodal networks with hierarchical encoder-decoder. In Proceedings of the 26th ACM International Conference on Multimedia. 537–545.

Digital Library

[9]

Jun Han, Albert Jin Chung, and Patrick Tague. 2017. Pitchln: Eavesdropping via intelligible speech reconstruction using non-acoustic sensor fusion. In Proceedings of the 16th ACM/IEEE International Conference on Information Processing in Sensor Networks. 181–192.

Digital Library

[10]

Po-Yao Huang, Guoliang Kang, Wenhe Liu, Xiaojun Chang, and Alexander G. Hauptmann. 2019. Annotation efficient cross-modal retrieval with adversarial attentive alignment. In Proceedings of the 27th ACM International Conference on Multimedia. 1758–1767.

Digital Library

[11]

Zhiwei Jin, Juan Cao, Han Guo, Yongdong Zhang, and Jiebo Luo. 2017. Multimodal fusion with recurrent neural networks for rumor detection on microblogs. In Proceedings of the 25th ACM international conference on Multimedia. 795–816.

Digital Library

[12]

R. Kothari and V. Jain. 2002. Learning from labeled and unlabeled data. In Proceedings of the International Joint Conference on Neural Networks (IJCNN’02), Vol. 3. 2803–2808.

[13]

Huafeng Kuang, Rongrong Ji, Hong Liu, Shengchuan Zhang, Xiaoshuai Sun, Feiyue Huang, and Baochang Zhang. 2019. Multi-modal multi-layer fusion network with average binary center loss for face anti-spoofing. In Proceedings of the 27th ACM International Conference on Multimedia. 48–56.

Digital Library

[14]

Hyeokhyen Kwon, Gregory D. Abowd, and Thomas Plötz. 2019. Handling annotation uncertainty in human activity recognition. In Proceedings of the 23rd International Symposium on Wearable Computers. 109–117.

Digital Library

[15]

Jiayu Lei, Zheng Zhang, Lan Zhang, and Xiang-Yang Li. 2022. COCA: Cost-Effective collaborative annotation system by combining experts and amateurs. In Proceedings of the 38th IEEE International Conference on Data Engineering (ICDE’22). IEEE.

[16]

Kehan Li, Jiming Chen, Baosheng Yu, Zhangchong Shen, Chao Li, and Shibo He. 2020. Supreme: Fine-grained radio map reconstruction via spatial-temporal fusion network. In Proceedings of the 19th ACM/IEEE International Conference on Information Processing in Sensor Networks (IPSN’20). IEEE, 1–12.

[17]

Sheng Li, Xugang Lu, Shinsuke Sakai, Masato Mimura, and Tatsuya Kawahara. 2017. Semi-supervised ensemble DNN acoustic model training. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP’17). IEEE, 5270–5274.

Digital Library

[18]

Cihang Liu, Lan Zhang, Zongqian Liu, Kebin Liu, Xiangyang Li, and Yunhao Liu. 2016. Lasagna: Towards deep hierarchical understanding and searching over mobile sensing data. In Proceedings of the 22nd Annual International Conference on Mobile Computing and Networking. ACM, 334–347.

Digital Library

[19]

Shengzhong Liu, Shuochao Yao, Jinyang Li, Dongxin Liu, Tianshi Wang, Huajie Shao, and Tarek Abdelzaher. 2020. GIobalFusion: A global attentional deep learning framework for multisensor information fusion. Proc. ACM Interact. Mobile Wear. Ubiq. Technol. 4, 1 (2020), 1–27.

Digital Library

[20]

Jiaxin Ma, Hao Tang, Wei-Long Zheng, and Bao-Liang Lu. 2019. Emotion recognition using multimodal residual LSTM network. In Proceedings of the 27th ACM International Conference on Multimedia. 176–183.

Digital Library

[21]

Akhil Mathur, Tianlin Zhang, Sourav Bhattacharya, Petar Velickovic, Leonid Joffe, Nicholas D. Lane, Fahim Kawsar, and Pietro Lió. 2018. Using deep data augmentation training to address software and hardware heterogeneities in wearable and smartphone sensing devices. In Proceedings of the 17th ACM/IEEE International Conference on Information Processing in Sensor Networks (IPSN’18). IEEE, 200–211.

Digital Library

[22]

Xiao Ning, Yang Panlong, Yan Yubo, Zhou Hao, and Li Xiang-Yang. 2018. Motion-Fi: Recognizing and counting repetitive motions with passive wireless backscattering. In Proceedings of the IEEE Conference on Computer Communications (INFOCOM’18). IEEE, 2024–2032.

[23]

Abhinav Parate, Meng-Chieh Chiu, Chaniel Chadowitz, Deepak Ganesan, and Evangelos Kalogerakis. 2014. Risq: Recognizing smoking gestures with inertial sensors on a wristband. In Proceedings of the 12th Annual International Conference on Mobile Systems, Applications, and Services. ACM, 149–161.

Digital Library

[24]

Glenn Shafer. 1976. A Mathematical Theory of Evidence. Vol. 42. Princeton University Press.

[25]

Shai Shalev-Shwartz et al. 2012. Online learning and online convex optimization. Found. Trends Mach. Learn. 4, 2 (2012), 107–194.

Digital Library

[26]

Longfei Shangguan, Zimu Zhou, and Kyle Jamieson. 2017. Enabling gesture-based interactions with objects. In Proceedings of the 15th Annual International Conference on Mobile Systems, Applications, and Services. ACM, 239–251.

Digital Library

[27]

Tomas Simon, Hanbyul Joo, Iain Matthews, and Yaser Sheikh. 2017. Hand keypoint detection in single images using multiview bootstrapping. In Proceedings of the IEEE/CVF Computer Vision and Pattern Recognition Conference (CVPR’17). 1145–1153.

[28]

Pardeep Singla, Manoj Duhan, and Sumit Saroha. 2022. An ensemble method to forecast 24-h ahead solar irradiance using wavelet decomposition and BiLSTM deep learning network. Earth Sci. Inf. 15, 1 (2022), 291–306.

[29]

Muhammad Tanveer, Ashraf Haroon Rashid, M. A. Ganaie, Motahar Reza, Imran Razzak, and Kai-Lung Hua. 2021. Classification of Alzheimer’s disease using ensemble of deep neural networks trained through transfer learning. IEEE J. Biomed. Health Inf. 26, 4 (2021), 1453–1463.

[30]

Bugra Tekin, Pablo Marquez Neila, Mathieu Salzmann, and Pascal Fua. 2017. Learning to fuse 2d and 3d image cues for monocular body pose estimation. In Proceedings of the International Conference on Computer Vision (ICCV’17). 3941–3950.

[31]

Isaac Triguero, Salvador García, and Francisco Herrera. 2015. Self-labeled techniques for semi-supervised learning: Taxonomy, software and empirical study. Knowl. Inf. Syst. 42, 2 (2015), 245–284.

Digital Library

[32]

Gul Varol, Ivan Laptev, and Cordelia Schmid. 2017. Long-term temporal convolutions for action recognition. IEEE Trans. Pattern Anal. Mach. Intell. 40, 6 (2017), 1510–1517.

[33]

Bin Wang, Bing Xue, and Mengjie Zhang. 2020. Particle swarm optimisation for evolving deep neural networks for image classification by evolving and stacking transferable blocks. In Proceedings of the IEEE Congress on Evolutionary Computation (CEC’20). IEEE, 1–8.

Digital Library

[34]

Nian Wang, Zhe Zhang, Tingting Li, Jing Xiao, and Li Cui. 2019. SGSF: A small groups based serial fusion method. In Proceedings of the 18th ACM/IEEE International Conference on Information Processing in Sensor Networks (IPSN’19). IEEE, 97–108.

Digital Library

[35]

Shih-En Wei, Varun Ramakrishna, Takeo Kanade, and Yaser Sheikh. 2016. Convolutional pose machines. In Proceedings of the IEEE/CVF Computer Vision and Pattern Recognition Conference (CVPR’16). 4724–4732.

[36]

Stefan Wilk, Manisha Luthra, and Wolfgang Effelsberg. 2016. One sensor is not enough: Adapting and fusing sensors for the quality assessment of user generated video. In Proceedings of the 24th ACM international conference on Multimedia. 626–630.

Digital Library

[37]

Jiajun Wu, Yinan Yu, Chang Huang, and Kai Yu. 2015. Deep multiple instance learning for image classification and auto-annotation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3460–3469.

[38]

Tianwei Xing, Sandeep Singh Sandha, Bharathan Balaji, Supriyo Chakraborty, and Mani Srivastava. 2018. Enabling edge devices that learn from each other: Cross modal training for activity recognition. In Proceedings of the 1st International Workshop on Edge Systems, Analytics and Networking. ACM, 37–42.

Digital Library

[39]

Jun Xu, Ting Yao, Yongdong Zhang, and Tao Mei. 2017. Learning multimodal attention LSTM networks for video captioning. In Proceedings of the 25th ACM International Conference on Multimedia. 537–545.

Digital Library

[40]

Hongfei Xue, Wenjun Jiang, Chenglin Miao, Ye Yuan, Fenglong Ma, Xin Ma, Yijiang Wang, Shuochao Yao, Wenyao Xu, Aidong Zhang, et al. 2019. DeepFusion: A deep learning framework for the fusion of heterogeneous sensory data. In Proceedings of the 20th ACM International Symposium on Mobile Ad Hoc Networking and Computing. ACM, 151–160.

Digital Library

[41]

Shuochao Yao, Shaohan Hu, Yiran Zhao, Aston Zhang, and Tarek Abdelzaher. 2017. Deepsense: A unified deep learning framework for time-series mobile sensing data processing. In Proceedings of the 26th International Conference on World Wide Web. International World Wide Web Conferences Steering Committee, 351–360.

Digital Library

[42]

Xuanke You, Lan Zhang, Haikuo Yu, Mu Yuan, and Xiang-Yang Li. 2022. KATN: Key activity detection via inexact supervised learning. (unpublished).

[43]

Mu Yuan, Lan Zhang, Fengxiang He, Xueting Tong, and Xiang-Yang Li. 2022. InFi: End-to-End learnable input filter for resource-efficient mobile-centric inference. In Proceedings of the 28th Annual International Conference on Mobile Computing And Networking (MobiCom’22). ACM, 228–241.

Digital Library

[44]

Mu Yuan, Lan Zhang, and Xiang-Yang Li. 2022. MLink: Linking black-box models for collaborative multi-model inference. In Proceedings fo the AAAI Conference on Artificial Intelligence.

[45]

Sangki Yun, Yi-Chao Chen, Huihuang Zheng, Lili Qiu, and Wenguang Mao. 2017. Strata: Fine-Grained acoustic-based device-free tracking. In Proceedings of the 15th Annual International Conference on Mobile Systems, Applications, and Services. ACM, 15–28.

Digital Library

[46]

Lan Zhang, Kebin Liu, Yonghang Jiang, Xiang-Yang Li, Yunhao Liu, Panlong Yang, and Zhenhua Li. 2017. Montage: Combine frames with movement continuity for realtime multi-user tracking. IEEE Trans. Mobile Comput. 16, 4 (2017), 1019–1031.

Digital Library

[47]

Lan Zhang, Daren Zheng, Zhengtao Wu, Mengjing Liu, Mu Yuan, Feng Han, and Xiang-Yang Li. 2021. MultiSense: Cross labelling and learning human activities using multimodal sensing data. In Proceedings of the IEEE 18th International Conference on Mobile Ad Hoc and Smart Systems (MASS’21). IEEE, 401–409.

Cited By

Bowen DHaiquan WYuxuan LZhao JMa YRunhe H(2024)Fair and Robust Federated Learning via Decentralized and Adaptive Aggregation based on BlockchainACM Transactions on Sensor Networks10.1145/3673656Online publication date: 17-Jun-2024
https://dl.acm.org/doi/10.1145/3673656
Ma JZhang FJin BSu CLi SWang ZNi J(2024)Push the Limit of Highly Accurate Ranging on Commercial UWB DevicesProceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies10.1145/36596028:2(1-27)Online publication date: 15-May-2024
https://dl.acm.org/doi/10.1145/3659602
Chen YKe QLi HWu YZhang Y(2024)xMeta: SSD-HDD-hybrid Optimization for Metadata Maintenance of Cloud-scale Object StorageACM Transactions on Architecture and Code Optimization10.1145/365260621:2(1-20)Online publication date: 21-May-2024
https://dl.acm.org/doi/10.1145/3652606
Show More Cited By

Index Terms

MultiSense: Cross-labelling and Learning Human Activities Using Multimodal Sensing Data
1. Human-centered computing
  1. Ubiquitous and mobile computing
2. Mathematics of computing
  1. Probability and statistics
    1. Statistical paradigms
      1. Time series analysis

Recommendations

Human-robot collaborative tutoring using multiparty multimodal spoken dialogue
HRI '14: Proceedings of the 2014 ACM/IEEE international conference on Human-robot interaction

In this paper, we describe a project that explores a novel experimental setup towards building a spoken, multi-modally rich, and human-like multiparty tutoring robot. A human-robot interaction setup is designed, and a human-human dialogue corpus is ...
Learning human multimodal dialogue strategies

We investigate the use of different machine learning methods in combination with feature selection techniques to explore human multimodal dialogue strategies and the use of those strategies for automated dialogue systems. We learn policies from data ...
Multimodal human discourse: gesture and speech

Gesture and speech combine to form a rich basis for human conversational interaction. To exploit these modalities in HCI, we need to understand the interplay between them and the way in which they support communication. We propose a framework for the ...

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Sensor Networks

ACM Transactions on Sensor Networks Volume 19, Issue 3

August 2023

597 pages

ISSN:1550-4859

EISSN:1550-4867

DOI:10.1145/3584865

Editor:
Yunhao Liu
Tsinghua University, China

Issue’s Table of Contents

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Journal Family

ACM Journals for the Design of Smart and Connected Systems

Publication History

Published: 17 April 2023

Online AM: 31 January 2023

Accepted: 26 November 2022

Revised: 06 October 2022

Received: 05 June 2022

Published in TOSN Volume 19, Issue 3

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

National Key R&D Program of China
China National Natural Science Foundation
The Fundamental Research Funds for the Central Universities

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

18
Total Citations
View Citations
393
Total Downloads

Downloads (Last 12 months)183
Downloads (Last 6 weeks)16

Reflects downloads up to 09 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Bowen DHaiquan WYuxuan LZhao JMa YRunhe H(2024)Fair and Robust Federated Learning via Decentralized and Adaptive Aggregation based on BlockchainACM Transactions on Sensor Networks10.1145/3673656Online publication date: 17-Jun-2024
https://dl.acm.org/doi/10.1145/3673656
Ma JZhang FJin BSu CLi SWang ZNi J(2024)Push the Limit of Highly Accurate Ranging on Commercial UWB DevicesProceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies10.1145/36596028:2(1-27)Online publication date: 15-May-2024
https://dl.acm.org/doi/10.1145/3659602
Chen YKe QLi HWu YZhang Y(2024)xMeta: SSD-HDD-hybrid Optimization for Metadata Maintenance of Cloud-scale Object StorageACM Transactions on Architecture and Code Optimization10.1145/365260621:2(1-20)Online publication date: 21-May-2024
https://dl.acm.org/doi/10.1145/3652606
Wu HWang ZLi YLiu XLee T(2024)Suitable and Style-Consistent Multi-Texture Recommendation for Cartoon IllustrationsACM Transactions on Multimedia Computing, Communications, and Applications10.1145/365251820:7(1-26)Online publication date: 16-May-2024
https://dl.acm.org/doi/10.1145/3652518
Chen LLi WCui XWang ZBerretti SWan S(2024)MS-GDA: Improving Heterogeneous Recipe Representation via Multinomial Sampling Graph Data AugmentationACM Transactions on Multimedia Computing, Communications, and Applications10.1145/364862020:7(1-23)Online publication date: 25-Apr-2024
https://dl.acm.org/doi/10.1145/3648620
Ding XHuang PZhang DLiang WLi FYang GLiao XLi Y(2024)MSEConv: A Unified Warping Framework for Video Frame InterpolationACM Transactions on Asian and Low-Resource Language Information Processing10.1145/3648364Online publication date: 14-Feb-2024
https://dl.acm.org/doi/10.1145/3648364
Zhang ZSun WWu HZhou YLi CChen ZMin XZhai GLin W(2024)GMS-3DQA: Projection-Based Grid Mini-patch Sampling for 3D Model Quality AssessmentACM Transactions on Multimedia Computing, Communications, and Applications10.1145/364381720:6(1-19)Online publication date: 8-Mar-2024
https://dl.acm.org/doi/10.1145/3643817
Ma YZhao CHuang BLi XBasu A(2024)RAST: Restorable Arbitrary Style TransferACM Transactions on Multimedia Computing, Communications, and Applications10.1145/363877020:5(1-21)Online publication date: 22-Jan-2024
https://dl.acm.org/doi/10.1145/3638770
Zeng XWang XXie Y(2024)Multiple Pseudo-Siamese Network with Supervised Contrast Learning for Medical Multi-modal RetrievalACM Transactions on Multimedia Computing, Communications, and Applications10.1145/363744120:5(1-23)Online publication date: 11-Jan-2024
https://dl.acm.org/doi/10.1145/3637441
Li ZShi YLing HChen JLiu BWang RZhao C(2024)Viewpoint Disentangling and Generation for Unsupervised Object Re-IDACM Transactions on Multimedia Computing, Communications, and Applications10.1145/363295920:5(1-23)Online publication date: 22-Jan-2024
https://dl.acm.org/doi/10.1145/3632959
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Full Text

View this article in Full Text.

HTML Format

View this article in HTML Format.

Media

Figures

Other

Tables

View full text|Download PDF

View Issue’s Table of Contents