Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3450268.3453537acmconferencesArticle/Chapter ViewAbstractPublication PagesiotdiConference Proceedingsconference-collections
research-article
Public Access

When Video meets Inertial Sensors: Zero-shot Domain Adaptation for Finger Motion Analytics with Inertial Sensors

Published: 18 May 2021 Publication History

Abstract

Ubiquitous finger motion tracking enables a number of exciting applications in augmented reality, sports analytics, rehabilitation-healthcare etc. While finger motion tracking with cameras is very mature, largely due to availability of massive training datasets, there is a dearth of training data for developing robust machine learning (ML) models for wearable IoT devices with Inertial Measurement Unit (IMU) sensors. Towards addressing this problem, this paper presents ZeroNet, a system that shows the feasibility of developing ML models for IMU sensors with zero training overhead. ZeroNet harvests training data from publicly available videos for performing inferences on IMU. The difference in data among video and IMU domains introduces a number of challenges due to differences in sensor-camera coordinate systems, body sizes of users, speed/orientation changes during gesturing, sensor position variations etc. ZeroNet addresses these challenges by systematically extracting motion data from videos and transforming them into acceleration and orientation information measured by IMU sensors. Furthermore, data-augmentation techniques are exploited that create synthetic variations in the harvested training data to enhance the generalizability and robustness of the ML models to user diversity. Evaluation with 10 users demonstrates a top-1 accuracy of 82.4% and a top-3 accuracy of 94.8% for recognition of 50 finger gestures thus indicating promise. While we have only scratched the surface, we outline a number of interesting possibilities for extending this work in the cross-disciplinary areas of computer vision, machine learning, and wearable IoT for enabling novel applications in finger motion tracking.

References

[1]
Dynamic time warping (dtw) algorithm for time series analysis. https://medium.com/datadriveninvestor/dynamic-time-warping-dtw-d51d1a1e4afc.
[2]
Knuckleball Grip, Part 3: Depth of the Baseball. "https://knuckleballnation.com/how-to/knuckleballgrip3/".
[3]
Leap motion developer. https://developer.leapmotion.com/.
[4]
Microsoft kinect2.0. https://developer.microsoft.com/en-us/windows/kinect.
[5]
Motiv ring | 24/7 smart ring | fitness + sleep tracking | online security. https://mymotiv.com/.
[6]
Oura ring: The most accurate sleep and activity tracker. https://ouraring.com/.
[7]
Profile battery usage with batterystats and battery historian. https://developer.android.com/topic/performance/power/setup-battery-historian.
[8]
Signing savvy - asl sign language video dictionary. https://www.signingsavvy.com/.
[9]
Wearables for motion tracking + wireless environment monitoring. https://mbientlab.com/store/adhesive-sensor-research-kit/.
[10]
Abadi, M., et al. Tensorflow: A system for large-scale machine learning. In OSDI (2016), pp. 265--283.
[11]
Acharya, S., et al. Towards a brain-computer interface for dexterous control of a multi-fingered prosthetic hand. In 2007 3rd International IEEE/EMBS Conference on Neural Engineering (2007), IEEE, pp. 200--203.
[12]
Agustino, R., et al. Impairment of individual finger movements in parkinson's disease. Movement disorders 18, 5 (2003), 560--565.
[13]
Ahmed, M. A., et al. A review on systems-based sensory gloves for sign language recognition state of the art between 2007 and 2017. Sensors 18, 7 (2018), 2208.
[14]
Bachlin, M., et al. Wearable assistant for parkinson's disease patients with the freezing of gait symptom. IEEE Transactions on Information Technology in Biomedicine 14, 2 (2009), 436--446.
[15]
Barbosa, P., et al. Unsupervised domain adaptation for human activity recognition. In International Conference on Intelligent Data Engineering and Automated Learning (2018), Springer, pp. 623--630.
[16]
Berndt, D. J., et al. Using dynamic time warping to find patterns in time series. In KDD workshop (1994), vol. 10, Seattle, WA, pp. 359--370.
[17]
Bertero, M., et al. The stability of inverse problems. In Inverse scattering problems in optics. Springer, 1980, pp. 161--214.
[18]
Cai, Y., et al. Weakly-supervised 3d hand pose estimation from monocular rgb images. In ECCV (2018), pp. 666--682.
[19]
Cao, Z., et al. Openpose: realtime multi-person 2d pose estimation using part affinity fields. arXiv preprint arXiv.1812.08008 (2018).
[20]
Chang, Y., et al. A systematic study of unsupervised domain adaptation for robust human-activity recognition. ACM IMWUT (2020).
[21]
Ciregan, D., et al. Multi-column deep neural networks for image classification. In CVPR (2012), IEEE, pp. 3642--3649.
[22]
Cordella, F., et al. Patient performance evaluation using kinect and monte carlo-based finger tracking. In IEEE BioRob (2012), IEEE, pp. 1967--1972.
[23]
Cubuk, E. D., Zoph, B., Mane, D., Vasudevan, v., and Le, Q. V. Autoaugment: Learning augmentation strategies from data. In Proceedings of the IEEE conference on computer vision and pattern recognition (2019), pp. 113--123.
[24]
Deng, J., et al. Imagenet: A large-scale hierarchical image database. In IEEE CVPR (2009), Ieee, pp. 248--255.
[25]
Devlin, J., et al. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018).
[26]
Google. Deploy machine learning models on mobile and IoT devices. "https://www.tensorflow.org/lite", 2019.
[27]
Han, X., et al. Pre-trained alexnet architecture with pyramid pooling and supervision for high spatial resolution remote sensing image scene classification. Remote Sensing 9, 8 (2017), 848.
[28]
Hesse, S., et al. A new electromechanical trainer for sensorimotor rehabilitation of paralysed fingers: a case series in chronic and acute stroke patients. Journal of neuroengineering and rehabilitation 5, 1 (2008), 21.
[29]
Huang, Y., et al. Deep inertial poser: learning to reconstruct human pose from sparse inertial measurements in real time. ACM TOG 37, 6 (2018), 1--15.
[30]
Ionescu, C., et al. Human3. 6m: Large scale datasets and predictive methods for 3d human sensing in natural environments. IEEE transactions on pattern analysis and machine intelligence 36, 7 (2013), 1325--1339.
[31]
Iqbal, U., et al. Hand pose estimation via latent 2.5 d heatmap regression. In ECCV (2018), pp. 118--134.
[32]
Jaitly, N., et al. Vocal tract length perturbation improves speech recognition. In Proc. ICML Workshop on Deep Learning for Audio, Speech and Language (2013).
[33]
Jiang, W., et al. Towards 3d human pose construction using wifi. In ACM MobiCom (2020), pp. 1--14.
[34]
Kanda, N., et al. Elastic spectral distortion for low resource speech recognition with deep neural networks. In 2013 IEEE Workshop on Automatic Speech Recognition and Understanding (2013).
[35]
Kim, M., et al. Golf swing segmentation from a single imu using machine learning. Sensors (2020).
[36]
Kingma, D. P., et al. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014).
[37]
Krizhevsky, A., et al. Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems (2012).
[38]
Kwon, H., et al. Imutube: Automatic extraction of virtual on-body accelerometry from video for human activity recognition. arXiv preprint arXiv:2006.05675 (2020).
[39]
Liu, Y., et al. Application informed motion signal processing for finger motion tracking using wearable sensors. In IEEE ICASSP 2020.
[40]
Liu, Y., et al. Finger gesture tracking for interactive applications: A pilot study with sign languages. ACM IMWUT 2020.
[41]
Mariakakis, A., et al. Drunk user interfaces: Determining blood alcohol level through everyday smartphone tasks. In CHI Conference on Human Factors in Computing Systems (2018).
[42]
Mueller, F., et al. Ganerated hands for real-time 3d hand tracking from monocular rgb. In IEEE CVPR (2018), pp. 49--59.
[43]
Murguialday, A. R., et al. Brain-computer interface for a prosthetic hand using local machine control and haptic feedback. In 2007 IEEE 10th International Conference on Rehabilitation Robotics (2007), IEEE, pp. 609--613.
[44]
Nawaz, W., et al. Classification of breast cancer histology images using alexnet. In International conference image analysis and recognition (2018), Springer.
[45]
Nirjon, S., et al. Typingring: A wearable ring platform for text input. In ACM MobiSys (2015).
[46]
Oura ring review. https://www.wareable.com/health-and-wellbeing/oura-ring-2018-review-6628, 2018.
[47]
Oura ring - what we learned about the sleep tracking ring. https://www.cnbc.com/2019/12/20/oura-ring-review---what-we-learned-about-the-sleep-tracking-ring.html, 2019.
[48]
Oura ring review - the early adopter catches the worm. https://www.androidauthority.com/oura-ring-2-review-933935/, 2019.
[49]
Parate, A., et al. Risq: Recognizing smoking gestures with inertial sensors on a wristband. In ACM MobiSys (2014), pp. 149--161.
[50]
Park, K.-B., et al. Deep learning-based smart task assistance in wearable augmented reality. Robotics and Computer-Integrated Manufacturing (2020).
[51]
Pu, Q., Gupta, S., Gollakota, S., and Patel, S. Whole-home gesture recognition using wireless signals. In Proceedings of the 19th annual international conference on Mobile computing & networking (2013), ACM, pp. 27--38.
[52]
Qu, C., et al. Bert with history answer embedding for conversational question answering. In ACM SIGIR (2019), pp. 1133--1136.
[53]
Rashid, K. M., et al. Times-series data augmentation and deep learning for construction equipment activity recognition. Advanced Engineering Informatics 42 (2019), 100944.
[54]
Reiss, A., et al. Introducing a new benchmarked dataset for activity monitoring. In IEEE ISWC (2012), pp. 108--109.
[55]
Rey, V. F., et al. Let there be imu data: generating training data for wearable, motion sensor based activity recognition from monocular rgb videos. In ACM UbiComp/ISWC (2019), pp. 699--708.
[56]
Roy, N., et al. I am a smartphone and i can tell my user's walking direction. In ACM MobiSys (2014).
[57]
Sato, I., et al. Apac: Augmented pattern classification with neural networks. arXiv preprint arXiv:1505.03229 (2015).
[58]
Sen, S., et al. Annapurna: An automated smartwatch-based eating detection and food journaling system. Pervasive and Mobile Computing (2020), 101259.
[59]
Shaeffer, D. K. Mems inertial sensors: A tutorial overview. IEEE Communications Magazine 51, 4 (2013), 100--109.
[60]
Shen, S., Wang, H., and Roy Choudhury, R. I am a smartwatch and i can track my user's arm. In Proceedings of the 14th annual international conference on Mobile systems, applications, and services (2016), ACM, pp. 85--96.
[61]
Simard, P. Y., et al. Best practices for convolutional neural networks applied to visual document analysis. In Icdar (2003), vol. 3.
[62]
Socher, R., et al. Zero-shot learning through cross-modal transfer. In NeurIPS (2013), pp. 935--943.
[63]
Google project soli. https://atap.google.com/soli/, 2020.
[64]
Sun, B., et al. Deep coral: Correlation alignment for deep domain adaptation. In ECCV (2016), Springer, pp. 443--450.
[65]
Susanto, E. A., et al. Efficacy of robot-assisted fingers training in chronic stroke survivors: a pilot randomized-controlled trial. Journal of neuroengineering and rehabilitation 12, 1 (2015), 42.
[66]
Takeda, S., et al. A multi-sensor setting activity recognition simulation tool. In ACM UbiComp (2018), pp. 1444--1448.
[67]
Tzeng, E., et al. Adversarial discriminative domain adaptation. In IEEE CVPR (2017), pp. 7167--7176.
[68]
Um, T. T., et al. Data augmentation of wearable sensor data for parkinson's disease monitoring using convolutional neural networks. In ACM ICMI (2017).
[69]
Vicon - award winning motion capture system. https://www.vicon.com/, 2020.
[70]
Wager, S., et al. Dropout training as adaptive regularization. In Advances in neural information processing systems (2013), pp. 351--359.
[71]
Wan, L., et al. Regularization of neural networks using dropconnect. In ICML (2013), pp. 1058--1066.
[72]
Wang, H., et al. No need to war-drive: Unsupervised indoor localization. In ACM MobiSys (2012), pp. 197--210.
[73]
Xiang, D., et al. Monocular total capture: Posing face, body, and hands in the wild. In IEEE CVPR (2019).
[74]
Xiao, F., et al. A deep learning method for complex human activity recognition using virtual wearable sensors. arXiv preprint arXiv:2003.01874 (2020).
[75]
Young, A. D., et al. Imusim: A simulation environment for inertial sensing algorithm design and evaluation. In ACM/IEEE IPSN (2011).
[76]
Zhang, H., et al. Pretraining-based natural language generation for text summarization. arXiv preprint arXiv:1902.09243 (2019).
[77]
Zhang, H., et al. Pdlens: smartphone knows drug effectiveness among parkinson's via daily-life activity fusion. In ACM MobiCom (2020).
[78]
Zhang, S., et al. Deep generative cross-modal on-body accelerometer data synthesis from videos. In ACM UbiComp/ISWC (2020), pp. 223--227.
[79]
Zhao, M., et al. Emotion recognition using wireless signals. In ACM MobiCom (2016), pp. 95--108.
[80]
Zhao, M., et al. Learning sleep stages from radio signals: A conditional adversarial architecture. In ICML (2017), pp. 4100--4109.
[81]
Zhao, M., et al. Through-wall human mesh recovery using radio signals. In IEEE ICCV (2019), pp. 10113--10122.
[82]
Zhou, P., et al. Use it free: Instantly knowing your phone attitude. In ACM MobiCom (2014), pp. 605--616.
[83]
Zhou, Z., et al. Fine-tuning convolutional neural networks for biomedical image analysis: actively and incrementally. In IEEE CVPR (2017), pp. 7340--7351.

Cited By

View all
  • (2024)IMUGPT 2.0: Language-Based Cross Modality Transfer for Sensor-Based Human Activity RecognitionProceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies10.1145/36785458:3(1-32)Online publication date: 9-Sep-2024
  • (2024)Enhancing Inertial Hand based HAR through Joint Representation of Language, Pose and Synthetic IMUsProceedings of the 2024 ACM International Symposium on Wearable Computers10.1145/3675095.3676609(25-31)Online publication date: 5-Oct-2024
  • (2024)MARLP: Time-series Forecasting Control for Agricultural Managed Aquifer RechargeProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3637528.3671533(4862-4872)Online publication date: 25-Aug-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
IoTDI '21: Proceedings of the International Conference on Internet-of-Things Design and Implementation
May 2021
288 pages
ISBN:9781450383547
DOI:10.1145/3450268
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 18 May 2021

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Data argumentation
  2. Finger gesture
  3. IMU
  4. IoT
  5. Wearable

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Funding Sources

Conference

IoTDI '21
Sponsor:

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)263
  • Downloads (Last 6 weeks)45
Reflects downloads up to 13 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)IMUGPT 2.0: Language-Based Cross Modality Transfer for Sensor-Based Human Activity RecognitionProceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies10.1145/36785458:3(1-32)Online publication date: 9-Sep-2024
  • (2024)Enhancing Inertial Hand based HAR through Joint Representation of Language, Pose and Synthetic IMUsProceedings of the 2024 ACM International Symposium on Wearable Computers10.1145/3675095.3676609(25-31)Online publication date: 5-Oct-2024
  • (2024)MARLP: Time-series Forecasting Control for Agricultural Managed Aquifer RechargeProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3637528.3671533(4862-4872)Online publication date: 25-Aug-2024
  • (2024)Enhancing the Applicability of Sign Language TranslationIEEE Transactions on Mobile Computing10.1109/TMC.2024.335011123:9(8634-8648)Online publication date: Sep-2024
  • (2024)Poster Abstract: Enhancing Human Motion Sensing with synthesized Millimeter-Waves2024 23rd ACM/IEEE International Conference on Information Processing in Sensor Networks (IPSN)10.1109/IPSN61024.2024.00054(307-308)Online publication date: 13-May-2024
  • (2023)I Am an Earphone and I Can Hear My User’s Face: Facial Landmark Tracking Using Smart EarphonesACM Transactions on Internet of Things10.1145/36144385:1(1-29)Online publication date: 16-Dec-2023
  • (2023)SignRingProceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies10.1145/36108817:3(1-29)Online publication date: 27-Sep-2023
  • (2023)Synthetic Smartwatch IMU Data Generation from In-the-wild ASL VideosProceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies10.1145/35962617:2(1-34)Online publication date: 12-Jun-2023
  • (2023)SmartASLProceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies10.1145/35962557:2(1-21)Online publication date: 12-Jun-2023
  • (2023)SoundSieve: Seconds-Long Audio Event Recognition on Intermittently-Powered SystemsProceedings of the 21st Annual International Conference on Mobile Systems, Applications and Services10.1145/3581791.3596859(28-41)Online publication date: 18-Jun-2023
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media