Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

UltraCLR: Contrastive Representation Learning Framework for Ultrasound-based Sensing

Published: 11 May 2024 Publication History
  • Get Citation Alerts
  • Abstract

    We propose UltraCLR, a new contrastive learning framework that fuses dual modulation ultrasonic sensing signals to enhance gesture representation. Most existing ultrasound-based gesture recognition tasks rely on a large amount of manually labeled samples to learn task-specific representations via end-to-end training. However, they cannot exploit unlabeled continuous gesture signals that are easy to collect. Inspired by recent self-supervised learning techniques, UltraCLR aims to autonomously learn a ubiquitous gesture signal representation that can benefit all tasks from low-cost unlabeled signals. We use the STFT heatmap as a secondary input and leverage the contrastive learning framework to improve the high-quality Channel Impulsive Response heatmap input representations. The learned representations can better represent the spatial-position information and intermediate states of gesture movement. With the representation learned by UltraCLR, we can greatly reduce the complexity of downstream gesture recognition tasks so that they can be completed using a simple classifier trained with a small training set and a lower computational cost. Our experimental results show that UltraCLR outperforms state-of-the-art gesture recognition systems with only a few labeled samples and achieves more than 85% reduction in computational complexity and over 9× improvement in inference speed.

    References

    [1]
    Sejal Bhalla, Mayank Goel, and Rushil Khurana. 2022. IMU2Doppler: Cross-modal domain adaptation for doppler-based activity recognition using IMU data. Proc. ACM Interact. Mob. Wear. Ubiq. Technol. 5, 4 (2022), 1–20.
    [2]
    Romil Bhardwaj, Zhengxu Xia, Ganesh Ananthanarayanan, Junchen Jiang, Yuanchao Shu, Nikolaos Karianakis, Kevin Hsieh, Paramvir Bahl, and Ion Stoica. 2022. Ekya: Continuous learning of video analytics models on edge compute servers. In Proceedings of USENIX NSDI. 119–135.
    [3]
    Mathilde Caron, Ishan Misra, Julien Mairal, Priya Goyal, Piotr Bojanowski, and Armand Joulin. 2020. Unsupervised learning of visual features by contrasting cluster assignments. In Proceedings of NeurlPS. 9912–9924.
    [4]
    Youngjae Chang, Akhil Mathur, Anton Isopoussu, Junehwa Song, and Fahim Kawsar. 2020. A systematic study of unsupervised domain adaptation for robust human-activity recognition. Proc. ACM Interact. Mob. Wear. Ubiq. Technol. 4, 1 (2020), 1–30.
    [5]
    Ting Chen, Simon Kornblith, Mohammad Norouzi, and Geoffrey Hinton. 2020. A simple framework for contrastive learning of visual representations. In Proceedings of ICML. 1597–1607.
    [6]
    Ting Chen, Simon Kornblith, Kevin Swersky, Mohammad Norouzi, and Geoffrey Hinton. 2020. Big self-supervised models are strong semi-supervised learners. In Proceedings of NeurlPS. 22243–22255 pages.
    [7]
    Xinlei Chen, Haoqi Fan, Ross B. Girshick, and Kaiming He. 2020. Improved baselines with momentum contrastive learning. CoRR abs/2003.04297 (2020), 1–20.
    [8]
    Yanjiao Chen, Meng Xue, Jian Zhang, Qianyun Guan, Zhiyuan Wang, Qian Zhang, and Wei Wang. 2021. ChestLive: Fortifying voice-based authentication with chest motion biometric on smart devices. Proc. ACM Interact. Mob. Wear. Ubiq. Technol. 5, 4 (2021), 1–25.
    [9]
    Haiming Cheng and Wei Lou. 2021. Push the limit of device-free acoustic sensing on commercial mobile devices. In Proceedings of IEEE INFOCOM. 1–10.
    [10]
    Taesik Gong, Yeonsu Kim, Jinwoo Shin, and Sung-Ju Lee. 2019. Metasense: Few-shot adaptation to untrained conditions in deep mobile sensing. In Proceedings of ACM SenSys. 110–123.
    [11]
    Jean-Bastien Grill, Florian Strub, Florent Altché, Corentin Tallec, Pierre Richemond, Elena Buchatskaya, Carl Doersch, Bernardo Avila Pires, Zhaohan Guo, Mohammad Gheshlaghi Azar, Bilal Piot, koray kavukcuoglu, Remi Munos, and Michal Valko. 2020. Bootstrap your own latent - a new approach to self-supervised learning. In Proceedings of NeurlPS. 21271–21284.
    [12]
    Kaiwen Guo, Hao Zhou, Ye Tian, Wangqiu Zhou, Yusheng Ji, and Xiang-Yang Li. 2022. Mudra: A multi-modal smartwatch interactive system with hand gesture recognition and user identification. In Proceedings of IEEE INFOCOM. 100–109.
    [13]
    Sidhant Gupta, Daniel Morris, Shwetak Patel, and Desney Tan. 2012. Soundwave: Using the doppler effect to sense gestures. In Proceedings of SIGCHI. 1911–1914.
    [14]
    Zijun Han, Lingchao Guo, Zhaoming Lu, Xiangming Wen, and Wei Zheng. 2020. Deep adaptation networks based gesture recognition using commodity WiFi. In Proceedings of IEEE WCNC. 1–7.
    [15]
    Kaiming He, Haoqi Fan, Yuxin Wu, Saining Xie, and Ross Girshick. 2020. Momentum contrast for unsupervised visual representation learning. In Proceedings of IEEE CVPR. 9729–9738.
    [16]
    Eugene Hogenauer. 1981. An economical class of digital filters for decimation and interpolation. IEEE Trans. Acoust. Speech Sign. Process. 29, 2 (1981), 155–162.
    [17]
    Yu Huang, Chenzhuang Du, Zihui Xue, Xuanyao Chen, Hang Zhao, and Longbo Huang. 2021. What makes multi-modal learning better than single (provably). In Proceedings of NeurlPS. 10944–10956.
    [18]
    Hang Li, Xi Chen, Ju Wang, Di Wu, and Xue Liu. 2022. DAFI: WiFi-based device-free indoor localization via domain adaptation. Proc. ACM Interact. Mob. Wear. Ubiq. Technol. 5, 4 (2022), 1–21.
    [19]
    Tianhong Li, Lijie Fan, Mingmin Zhao, Yingcheng Liu, and Dina Katabi. 2019. Making the invisible bisible: Action recognition through walls and occlusions. In Proceedings of IEEE ICCV. 872–881.
    [20]
    Kang Ling, Haipeng Dai, Yuntang Liu, Alex X. Liu, Wei Wang, and Qing Gu. 2022. UltraGesture: Fine-grained gesture sensing and recognition. IEEE Trans. Mob. Comput. 21, 7 (2022), 2620–2636.
    [21]
    Chris Xiaoxuan Lu, Muhamad Risqi U. Saputra, Peijun Zhao, Yasin Almalioglu, Pedro P. B. De Gusmao, Changhao Chen, Ke Sun, Niki Trigoni, and Andrew Markham. 2020. MilliEgo: Single-chip MMWave radar aided egomotion estimation via deep sensor fusion. In Proceedings of ACM SenSys. 109–122.
    [22]
    Yang Qifan, Tang Hao, Zhao Xuebing, Li Yin, and Zhang Sanfeng. 2014. Dolphin: Ultrasonic-based gesture recognition on smartphone platform. In Proceedings of IEEE CSE. 1461–1468.
    [23]
    Wenjie Ruan, Quan Z. Sheng, Lei Yang, Tao Gu, Peipei Xu, and Longfei Shangguan. 2016. AudioGest: Enabling fine-grained hand gesture detection by decoding echo signal. In Proceedings of ACM UbiComp. 474–485.
    [24]
    Andrea Rosales Sanabria, Franco Zambonelli, and Juan Ye. 2021. Unsupervised domain adaptation in activity recognition: A GAN-based approach. IEEE Access 9 (2021), 19421–19438.
    [25]
    Nikunj Saunshi, Orestis Plevrakis, Sanjeev Arora, Mikhail Khodak, and Hrishikesh Khandeparkar. 2019. A theoretical analysis of contrastive unsupervised representation learning. In Proceedings of ICML. 5628–5637.
    [26]
    Zhiyao Sheng, Huatao Xu, Qian Zhang, and Dong Wang. 2022. Facilitating radar-based gesture recognition with self-supervised learning. In Proceedings of IEEE SECON. 154–162.
    [27]
    Ruiyuan Song, Dongheng Zhang, Zhi Wu, Cong Yu, Chunyang Xie, Shuai Yang, Yang Hu, and Yan Chen. 2022. RF-URL: Unsupervised representation learning for RF sensing. In Proceedings of ACM MobiCom. 282–295.
    [28]
    Ke Sun, Chen Chen, and Xinyu Zhang. 2020. “Alexa, stop spying on me!” speech privacy protection against voice assistants. In Proceedings of ACM SenSys. 298–311.
    [29]
    Ke Sun and Xinyu Zhang. 2021. UltraSE: Single-channel speech enhancement using ultrasound. In Proceedings of ACM MobiCom. 160–173.
    [30]
    Ke Sun, Ting Zhao, Wei Wang, and Lei Xie. 2018. Vskin: Sensing touch gestures on surfaces of mobile devices using acoustic signals. In Proceedings of ACM MobiCom. 591–605.
    [31]
    Chi Ian Tang, Ignacio Perez-Pozuelo, Dimitris Spathis, Soren Brage, Nick Wareham, and Cecilia Mascolo. 2021. SelfHAR: Improving human activity recognition through self-training with unlabeled data. Proc. ACM Interact. Mob. Wear. Ubiq. Technol. 5, 1 (2021), 1–30.
    [32]
    Yonglong Tian, Dilip Krishnan, and Phillip Isola. 2020. Contrastive multiview coding. In Proceedings of ECCV. 776–794.
    [33]
    Aäron van den Oord, Yazhe Li, and Oriol Vinyals. 2018. Representation learning with contrastive predictive coding. arXiv:1807.03748. Retrieved from http://arxiv.org/abs/1807.03748.
    [34]
    Laurens Van der Maaten and Geoffrey Hinton. 2008. Visualizing data using t-SNE. J. Mach. Learn. Res. 9, 11 (2008), 2579–2605.
    [35]
    Haoran Wan, Shuyu Shi, Wenyu Cao, Wei Wang, and Guihai Chen. 2021. RespTracker: Multi-user room-scale respiration tracking with commercial acoustic devices. In Proceedings of IEEE INFOCOM. 1–10.
    [36]
    Wei Wang, Alex X. Liu, and Ke Sun. 2016. Device-free gesture tracking using acoustic signals. In Proceedings of ACM MobiCom. 82–94.
    [37]
    Xun Wang, Ke Sun, Ting Zhao, Wei Wang, and Qing Gu. 2020. Dynamic speed warping: Similarity-based one-shot learning for device-free gesture signals. In Proceedings of IEEE INFOCOM. 556–565.
    [38]
    Yanwen Wang, Jiaxing Shen, and Yuanqing Zheng. 2022. Push the limit of acoustic gesture recognition. IEEE Trans. Mob. Comput. 21, 5 (2022), 1798–1811.
    [39]
    Rui Xiao, Jianwei Liu, Jinsong Han, and Kui Ren. 2021. OneFi: One-shot recognition for unseen gesture via COTS WiFi. In Proceedings of ACM SenSys. 206–219.
    [40]
    Huatao Xu, Pengfei Zhou, Rui Tan, Mo Li, and Guobin Shen. 2021. LIMU-BERT: Unleashing the potential of unlabeled data for IMU sensing applications. In Proceedings of ACM SenSys. 220–233.
    [41]
    Mang Ye, Xu Zhang, Pong C. Yuen, and Shih-Fu Chang. 2019. Unsupervised embedding learning via invariant and spreading instance feature. In Proceedings of IEEE CVPR. 6210–6219.
    [42]
    Chun-Hsiao Yeh, Cheng-Yao Hong, Yen-Chi Hsu, Tyng-Luh Liu, Yubei Chen, and Yann LeCun. 2022. Decoupled contrastive learning. In Proceedings of ECCV. 668–684.
    [43]
    Sangki Yun, Yi-Chao Chen, Huihuang Zheng, Lili Qiu, and Wenguang Mao. 2017. Strata: Fine-grained acoustic-based device-free tracking. In Proceedings of ACM MobiSys. 15–28.
    [44]
    Jie Zhang, Zhanyong Tang, Meng Li, Dingyi Fang, Petteri Nurmi, and Zheng Wang. 2018. CrossSense: Towards cross-site and large-scale WiFi sensing. In Proceedings of ACM MobiCom. 305–320.
    [45]
    Mingmin Zhao, Tianhong Li, Mohammad Abu Alsheikh, Yonglong Tian, Hang Zhao, Antonio Torralba, and Dina Katabi. 2018. Through-wall human pose estimation using radio signals. In Proceedings of IEEE CVPR. 7356–7365.
    [46]
    Mingmin Zhao, Yonglong Tian, Hang Zhao, Mohammad Abu Alsheikh, Tianhong Li, Rumen Hristov, Zachary Kabelac, Dina Katabi, and Antonio Torralba. 2018. RF-based 3D skeletons. In Proceedings of ACM SIGCOMM. 267–281.
    [47]
    Han Zou, Jianfei Yang, Yuxun Zhou, Lihua Xie, and Costas J. Spanos. 2018. Robust WiFi-enabled device-free gesture recognition via unsupervised adversarial domain adaptation. In Proceedings of IEEE ICCCN. 1–8.

    Cited By

    View all
    • (2024)Network Information Security Monitoring Under Artificial Intelligence EnvironmentInternational Journal of Information Security and Privacy10.4018/IJISP.34503818:1(1-25)Online publication date: 21-Jun-2024
    • (2024)Heterogeneous Fusion and Integrity Learning Network for RGB-D Salient Object DetectionACM Transactions on Multimedia Computing, Communications, and Applications10.1145/365647620:7(1-24)Online publication date: 15-May-2024
    • (2024)MultiRider: Enabling Multi-Tag Concurrent OFDM Backscatter by Taming In-band InterferenceProceedings of the 22nd Annual International Conference on Mobile Systems, Applications and Services10.1145/3643832.3661862(292-303)Online publication date: 3-Jun-2024
    • Show More Cited By

    Index Terms

    1. UltraCLR: Contrastive Representation Learning Framework for Ultrasound-based Sensing

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Transactions on Sensor Networks
      ACM Transactions on Sensor Networks  Volume 20, Issue 4
      July 2024
      603 pages
      ISSN:1550-4859
      EISSN:1550-4867
      DOI:10.1145/3618082
      • Editor:
      • Wen Hu
      Issue’s Table of Contents

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Journal Family

      Publication History

      Published: 11 May 2024
      Online AM: 29 May 2023
      Accepted: 08 May 2023
      Revised: 09 March 2023
      Received: 22 December 2022
      Published in TOSN Volume 20, Issue 4

      Check for updates

      Author Tags

      1. Ultrasound-based sensing
      2. contrastive learning
      3. gesture recognition

      Qualifiers

      • Research-article

      Funding Sources

      • National Natural Science Foundation of China
      • program B for Outstanding Ph.D. candidate of Nanjing University

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)275
      • Downloads (Last 6 weeks)26
      Reflects downloads up to 09 Aug 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)Network Information Security Monitoring Under Artificial Intelligence EnvironmentInternational Journal of Information Security and Privacy10.4018/IJISP.34503818:1(1-25)Online publication date: 21-Jun-2024
      • (2024)Heterogeneous Fusion and Integrity Learning Network for RGB-D Salient Object DetectionACM Transactions on Multimedia Computing, Communications, and Applications10.1145/365647620:7(1-24)Online publication date: 15-May-2024
      • (2024)MultiRider: Enabling Multi-Tag Concurrent OFDM Backscatter by Taming In-band InterferenceProceedings of the 22nd Annual International Conference on Mobile Systems, Applications and Services10.1145/3643832.3661862(292-303)Online publication date: 3-Jun-2024
      • (2024)Driver intention prediction based on multi-dimensional cross-modality information interactionMultimedia Systems10.1007/s00530-024-01282-330:2Online publication date: 15-Mar-2024

      View Options

      Get Access

      Login options

      Full Access

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Full Text

      View this article in Full Text.

      Full Text

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media