research-article

UltraCLR: Contrastive Representation Learning Framework for Ultrasound-based Sensing

Authors:

Zhizheng Yang, Wei Wang,

Qing GuAuthors Info & Claims

ACM Transactions on Sensor Networks, Volume 20, Issue 4

Article No.: 82, Pages 1 - 23

https://doi.org/10.1145/3597498

Published: 11 May 2024 Publication History

Abstract

We propose UltraCLR, a new contrastive learning framework that fuses dual modulation ultrasonic sensing signals to enhance gesture representation. Most existing ultrasound-based gesture recognition tasks rely on a large amount of manually labeled samples to learn task-specific representations via end-to-end training. However, they cannot exploit unlabeled continuous gesture signals that are easy to collect. Inspired by recent self-supervised learning techniques, UltraCLR aims to autonomously learn a ubiquitous gesture signal representation that can benefit all tasks from low-cost unlabeled signals. We use the STFT heatmap as a secondary input and leverage the contrastive learning framework to improve the high-quality Channel Impulsive Response heatmap input representations. The learned representations can better represent the spatial-position information and intermediate states of gesture movement. With the representation learned by UltraCLR, we can greatly reduce the complexity of downstream gesture recognition tasks so that they can be completed using a simple classifier trained with a small training set and a lower computational cost. Our experimental results show that UltraCLR outperforms state-of-the-art gesture recognition systems with only a few labeled samples and achieves more than 85% reduction in computational complexity and over 9× improvement in inference speed.

References

[1]

Sejal Bhalla, Mayank Goel, and Rushil Khurana. 2022. IMU2Doppler: Cross-modal domain adaptation for doppler-based activity recognition using IMU data. Proc. ACM Interact. Mob. Wear. Ubiq. Technol. 5, 4 (2022), 1–20.

[2]

Romil Bhardwaj, Zhengxu Xia, Ganesh Ananthanarayanan, Junchen Jiang, Yuanchao Shu, Nikolaos Karianakis, Kevin Hsieh, Paramvir Bahl, and Ion Stoica. 2022. Ekya: Continuous learning of video analytics models on edge compute servers. In Proceedings of USENIX NSDI. 119–135.

[3]

Mathilde Caron, Ishan Misra, Julien Mairal, Priya Goyal, Piotr Bojanowski, and Armand Joulin. 2020. Unsupervised learning of visual features by contrasting cluster assignments. In Proceedings of NeurlPS. 9912–9924.

[4]

Youngjae Chang, Akhil Mathur, Anton Isopoussu, Junehwa Song, and Fahim Kawsar. 2020. A systematic study of unsupervised domain adaptation for robust human-activity recognition. Proc. ACM Interact. Mob. Wear. Ubiq. Technol. 4, 1 (2020), 1–30.

Digital Library

[5]

Ting Chen, Simon Kornblith, Mohammad Norouzi, and Geoffrey Hinton. 2020. A simple framework for contrastive learning of visual representations. In Proceedings of ICML. 1597–1607.

[6]

Ting Chen, Simon Kornblith, Kevin Swersky, Mohammad Norouzi, and Geoffrey Hinton. 2020. Big self-supervised models are strong semi-supervised learners. In Proceedings of NeurlPS. 22243–22255 pages.

[7]

Xinlei Chen, Haoqi Fan, Ross B. Girshick, and Kaiming He. 2020. Improved baselines with momentum contrastive learning. CoRR abs/2003.04297 (2020), 1–20.

[8]

Yanjiao Chen, Meng Xue, Jian Zhang, Qianyun Guan, Zhiyuan Wang, Qian Zhang, and Wei Wang. 2021. ChestLive: Fortifying voice-based authentication with chest motion biometric on smart devices. Proc. ACM Interact. Mob. Wear. Ubiq. Technol. 5, 4 (2021), 1–25.

Digital Library

[9]

Haiming Cheng and Wei Lou. 2021. Push the limit of device-free acoustic sensing on commercial mobile devices. In Proceedings of IEEE INFOCOM. 1–10.

Digital Library

[10]

Taesik Gong, Yeonsu Kim, Jinwoo Shin, and Sung-Ju Lee. 2019. Metasense: Few-shot adaptation to untrained conditions in deep mobile sensing. In Proceedings of ACM SenSys. 110–123.

Digital Library

[11]

Jean-Bastien Grill, Florian Strub, Florent Altché, Corentin Tallec, Pierre Richemond, Elena Buchatskaya, Carl Doersch, Bernardo Avila Pires, Zhaohan Guo, Mohammad Gheshlaghi Azar, Bilal Piot, koray kavukcuoglu, Remi Munos, and Michal Valko. 2020. Bootstrap your own latent - a new approach to self-supervised learning. In Proceedings of NeurlPS. 21271–21284.

[12]

Kaiwen Guo, Hao Zhou, Ye Tian, Wangqiu Zhou, Yusheng Ji, and Xiang-Yang Li. 2022. Mudra: A multi-modal smartwatch interactive system with hand gesture recognition and user identification. In Proceedings of IEEE INFOCOM. 100–109.

Digital Library

[13]

Sidhant Gupta, Daniel Morris, Shwetak Patel, and Desney Tan. 2012. Soundwave: Using the doppler effect to sense gestures. In Proceedings of SIGCHI. 1911–1914.

Digital Library

[14]

Zijun Han, Lingchao Guo, Zhaoming Lu, Xiangming Wen, and Wei Zheng. 2020. Deep adaptation networks based gesture recognition using commodity WiFi. In Proceedings of IEEE WCNC. 1–7.

Digital Library

[15]

Kaiming He, Haoqi Fan, Yuxin Wu, Saining Xie, and Ross Girshick. 2020. Momentum contrast for unsupervised visual representation learning. In Proceedings of IEEE CVPR. 9729–9738.

[16]

Eugene Hogenauer. 1981. An economical class of digital filters for decimation and interpolation. IEEE Trans. Acoust. Speech Sign. Process. 29, 2 (1981), 155–162.

[17]

Yu Huang, Chenzhuang Du, Zihui Xue, Xuanyao Chen, Hang Zhao, and Longbo Huang. 2021. What makes multi-modal learning better than single (provably). In Proceedings of NeurlPS. 10944–10956.

[18]

Hang Li, Xi Chen, Ju Wang, Di Wu, and Xue Liu. 2022. DAFI: WiFi-based device-free indoor localization via domain adaptation. Proc. ACM Interact. Mob. Wear. Ubiq. Technol. 5, 4 (2022), 1–21.

Digital Library

[19]

Tianhong Li, Lijie Fan, Mingmin Zhao, Yingcheng Liu, and Dina Katabi. 2019. Making the invisible bisible: Action recognition through walls and occlusions. In Proceedings of IEEE ICCV. 872–881.

[20]

Kang Ling, Haipeng Dai, Yuntang Liu, Alex X. Liu, Wei Wang, and Qing Gu. 2022. UltraGesture: Fine-grained gesture sensing and recognition. IEEE Trans. Mob. Comput. 21, 7 (2022), 2620–2636.

[21]

Chris Xiaoxuan Lu, Muhamad Risqi U. Saputra, Peijun Zhao, Yasin Almalioglu, Pedro P. B. De Gusmao, Changhao Chen, Ke Sun, Niki Trigoni, and Andrew Markham. 2020. MilliEgo: Single-chip MMWave radar aided egomotion estimation via deep sensor fusion. In Proceedings of ACM SenSys. 109–122.

Digital Library

[22]

Yang Qifan, Tang Hao, Zhao Xuebing, Li Yin, and Zhang Sanfeng. 2014. Dolphin: Ultrasonic-based gesture recognition on smartphone platform. In Proceedings of IEEE CSE. 1461–1468.

Digital Library

[23]

Wenjie Ruan, Quan Z. Sheng, Lei Yang, Tao Gu, Peipei Xu, and Longfei Shangguan. 2016. AudioGest: Enabling fine-grained hand gesture detection by decoding echo signal. In Proceedings of ACM UbiComp. 474–485.

Digital Library

[24]

Andrea Rosales Sanabria, Franco Zambonelli, and Juan Ye. 2021. Unsupervised domain adaptation in activity recognition: A GAN-based approach. IEEE Access 9 (2021), 19421–19438.

[25]

Nikunj Saunshi, Orestis Plevrakis, Sanjeev Arora, Mikhail Khodak, and Hrishikesh Khandeparkar. 2019. A theoretical analysis of contrastive unsupervised representation learning. In Proceedings of ICML. 5628–5637.

[26]

Zhiyao Sheng, Huatao Xu, Qian Zhang, and Dong Wang. 2022. Facilitating radar-based gesture recognition with self-supervised learning. In Proceedings of IEEE SECON. 154–162.

Digital Library

[27]

Ruiyuan Song, Dongheng Zhang, Zhi Wu, Cong Yu, Chunyang Xie, Shuai Yang, Yang Hu, and Yan Chen. 2022. RF-URL: Unsupervised representation learning for RF sensing. In Proceedings of ACM MobiCom. 282–295.

Digital Library

[28]

Ke Sun, Chen Chen, and Xinyu Zhang. 2020. “Alexa, stop spying on me!” speech privacy protection against voice assistants. In Proceedings of ACM SenSys. 298–311.

[29]

Ke Sun and Xinyu Zhang. 2021. UltraSE: Single-channel speech enhancement using ultrasound. In Proceedings of ACM MobiCom. 160–173.

Digital Library

[30]

Ke Sun, Ting Zhao, Wei Wang, and Lei Xie. 2018. Vskin: Sensing touch gestures on surfaces of mobile devices using acoustic signals. In Proceedings of ACM MobiCom. 591–605.

Digital Library

[31]

Chi Ian Tang, Ignacio Perez-Pozuelo, Dimitris Spathis, Soren Brage, Nick Wareham, and Cecilia Mascolo. 2021. SelfHAR: Improving human activity recognition through self-training with unlabeled data. Proc. ACM Interact. Mob. Wear. Ubiq. Technol. 5, 1 (2021), 1–30.

Digital Library

[32]

Yonglong Tian, Dilip Krishnan, and Phillip Isola. 2020. Contrastive multiview coding. In Proceedings of ECCV. 776–794.

Digital Library

[33]

Aäron van den Oord, Yazhe Li, and Oriol Vinyals. 2018. Representation learning with contrastive predictive coding. arXiv:1807.03748. Retrieved from http://arxiv.org/abs/1807.03748.

[34]

Laurens Van der Maaten and Geoffrey Hinton. 2008. Visualizing data using t-SNE. J. Mach. Learn. Res. 9, 11 (2008), 2579–2605.

[35]

Haoran Wan, Shuyu Shi, Wenyu Cao, Wei Wang, and Guihai Chen. 2021. RespTracker: Multi-user room-scale respiration tracking with commercial acoustic devices. In Proceedings of IEEE INFOCOM. 1–10.

Digital Library

[36]

Wei Wang, Alex X. Liu, and Ke Sun. 2016. Device-free gesture tracking using acoustic signals. In Proceedings of ACM MobiCom. 82–94.

[37]

Xun Wang, Ke Sun, Ting Zhao, Wei Wang, and Qing Gu. 2020. Dynamic speed warping: Similarity-based one-shot learning for device-free gesture signals. In Proceedings of IEEE INFOCOM. 556–565.

Digital Library

[38]

Yanwen Wang, Jiaxing Shen, and Yuanqing Zheng. 2022. Push the limit of acoustic gesture recognition. IEEE Trans. Mob. Comput. 21, 5 (2022), 1798–1811.

[39]

Rui Xiao, Jianwei Liu, Jinsong Han, and Kui Ren. 2021. OneFi: One-shot recognition for unseen gesture via COTS WiFi. In Proceedings of ACM SenSys. 206–219.

Digital Library

[40]

Huatao Xu, Pengfei Zhou, Rui Tan, Mo Li, and Guobin Shen. 2021. LIMU-BERT: Unleashing the potential of unlabeled data for IMU sensing applications. In Proceedings of ACM SenSys. 220–233.

Digital Library

[41]

Mang Ye, Xu Zhang, Pong C. Yuen, and Shih-Fu Chang. 2019. Unsupervised embedding learning via invariant and spreading instance feature. In Proceedings of IEEE CVPR. 6210–6219.

[42]

Chun-Hsiao Yeh, Cheng-Yao Hong, Yen-Chi Hsu, Tyng-Luh Liu, Yubei Chen, and Yann LeCun. 2022. Decoupled contrastive learning. In Proceedings of ECCV. 668–684.

Digital Library

[43]

Sangki Yun, Yi-Chao Chen, Huihuang Zheng, Lili Qiu, and Wenguang Mao. 2017. Strata: Fine-grained acoustic-based device-free tracking. In Proceedings of ACM MobiSys. 15–28.

Digital Library

[44]

Jie Zhang, Zhanyong Tang, Meng Li, Dingyi Fang, Petteri Nurmi, and Zheng Wang. 2018. CrossSense: Towards cross-site and large-scale WiFi sensing. In Proceedings of ACM MobiCom. 305–320.

Digital Library

[45]

Mingmin Zhao, Tianhong Li, Mohammad Abu Alsheikh, Yonglong Tian, Hang Zhao, Antonio Torralba, and Dina Katabi. 2018. Through-wall human pose estimation using radio signals. In Proceedings of IEEE CVPR. 7356–7365.

[46]

Mingmin Zhao, Yonglong Tian, Hang Zhao, Mohammad Abu Alsheikh, Tianhong Li, Rumen Hristov, Zachary Kabelac, Dina Katabi, and Antonio Torralba. 2018. RF-based 3D skeletons. In Proceedings of ACM SIGCOMM. 267–281.

Digital Library

[47]

Han Zou, Jianfei Yang, Yuxun Zhou, Lihua Xie, and Costas J. Spanos. 2018. Robust WiFi-enabled device-free gesture recognition via unsupervised adversarial domain adaptation. In Proceedings of IEEE ICCCN. 1–8.

Cited By

Fu LLiu YZhang YLi M(2024)Network Information Security Monitoring Under Artificial Intelligence EnvironmentInternational Journal of Information Security and Privacy10.4018/IJISP.34503818:1(1-25)Online publication date: 21-Jun-2024
https://dl.acm.org/doi/10.4018/IJISP.345038
Gao HSu YWang FLi H(2024)Heterogeneous Fusion and Integrity Learning Network for RGB-D Salient Object DetectionACM Transactions on Multimedia Computing, Communications, and Applications10.1145/365647620:7(1-24)Online publication date: 15-May-2024
https://dl.acm.org/doi/10.1145/3656476
Wang SYan YHan FTian YDing YYang PLi XOkoshi TKo JLiKamWa R(2024)MultiRider: Enabling Multi-Tag Concurrent OFDM Backscatter by Taming In-band InterferenceProceedings of the 22nd Annual International Conference on Mobile Systems, Applications and Services10.1145/3643832.3661862(292-303)Online publication date: 3-Jun-2024
https://dl.acm.org/doi/10.1145/3643832.3661862
Show More Cited By

Index Terms

UltraCLR: Contrastive Representation Learning Framework for Ultrasound-based Sensing
1. Human-centered computing
  1. Ubiquitous and mobile computing
    1. Ubiquitous and mobile computing systems and tools

Recommendations

Contrastive learning from label distribution: A case study on text classification
Highlights
- The proposed method learns under the supervision of the predicted label distribution.
Abstract
State-of-the-art text classification models are dominated by deep neural networks, but they still struggle to the issue of poor generalization ability when using cross entropy loss for training. One of the reasons is the training ...
Metadata-Induced Contrastive Learning for Zero-Shot Multi-Label Text Classification
WWW '22: Proceedings of the ACM Web Conference 2022

Large-scale multi-label text classification (LMTC) aims to associate a document with its relevant labels from a large candidate set. Most existing LMTC approaches rely on massive human-annotated training data, which are often costly to obtain and suffer ...
Deep semi-supervised learning with contrastive learning and partial label propagation for image data
Abstract
Deep semi-supervised learning is becoming an active research topic because it jointly utilizes labeled and unlabeled samples in training deep neural networks. Recent advances are mainly focused on inductive semi-supervised learning ...

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Sensor Networks

ACM Transactions on Sensor Networks Volume 20, Issue 4

July 2024

603 pages

EISSN:1550-4867

DOI:10.1145/3618082

Editor:
Wen Hu
University of New South Wales, Australia

Issue’s Table of Contents

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Journal Family

ACM Journals for the Design of Smart and Connected Systems

Publication History

Published: 11 May 2024

Online AM: 29 May 2023

Accepted: 08 May 2023

Revised: 09 March 2023

Received: 22 December 2022

Published in TOSN Volume 20, Issue 4

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

National Natural Science Foundation of China
program B for Outstanding Ph.D. candidate of Nanjing University

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

4
Total Citations
View Citations
491
Total Downloads

Downloads (Last 12 months)281
Downloads (Last 6 weeks)12

Reflects downloads up to 19 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Fu LLiu YZhang YLi M(2024)Network Information Security Monitoring Under Artificial Intelligence EnvironmentInternational Journal of Information Security and Privacy10.4018/IJISP.34503818:1(1-25)Online publication date: 21-Jun-2024
https://dl.acm.org/doi/10.4018/IJISP.345038
Gao HSu YWang FLi H(2024)Heterogeneous Fusion and Integrity Learning Network for RGB-D Salient Object DetectionACM Transactions on Multimedia Computing, Communications, and Applications10.1145/365647620:7(1-24)Online publication date: 15-May-2024
https://dl.acm.org/doi/10.1145/3656476
Wang SYan YHan FTian YDing YYang PLi XOkoshi TKo JLiKamWa R(2024)MultiRider: Enabling Multi-Tag Concurrent OFDM Backscatter by Taming In-band InterferenceProceedings of the 22nd Annual International Conference on Mobile Systems, Applications and Services10.1145/3643832.3661862(292-303)Online publication date: 3-Jun-2024
https://dl.acm.org/doi/10.1145/3643832.3661862
Xue MXu ZQiao SZheng JLi TWang YPeng D(2024)Driver intention prediction based on multi-dimensional cross-modality information interactionMultimedia Systems10.1007/s00530-024-01282-330:2Online publication date: 15-Mar-2024
https://dl.acm.org/doi/10.1007/s00530-024-01282-3

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Full Text

View this article in Full Text.

Figures

Tables

Media

View full text|Download PDF

View Issue’s Table of Contents