research-article

Public Access

When Video meets Inertial Sensors: Zero-shot Domain Adaptation for Finger Motion Analytics with Inertial Sensors

Authors:

Mahanth GowdaAuthors Info & Claims

IoTDI '21: Proceedings of the International Conference on Internet-of-Things Design and Implementation

Pages 182 - 194

https://doi.org/10.1145/3450268.3453537

Published: 18 May 2021 Publication History

Abstract

Ubiquitous finger motion tracking enables a number of exciting applications in augmented reality, sports analytics, rehabilitation-healthcare etc. While finger motion tracking with cameras is very mature, largely due to availability of massive training datasets, there is a dearth of training data for developing robust machine learning (ML) models for wearable IoT devices with Inertial Measurement Unit (IMU) sensors. Towards addressing this problem, this paper presents ZeroNet, a system that shows the feasibility of developing ML models for IMU sensors with zero training overhead. ZeroNet harvests training data from publicly available videos for performing inferences on IMU. The difference in data among video and IMU domains introduces a number of challenges due to differences in sensor-camera coordinate systems, body sizes of users, speed/orientation changes during gesturing, sensor position variations etc. ZeroNet addresses these challenges by systematically extracting motion data from videos and transforming them into acceleration and orientation information measured by IMU sensors. Furthermore, data-augmentation techniques are exploited that create synthetic variations in the harvested training data to enhance the generalizability and robustness of the ML models to user diversity. Evaluation with 10 users demonstrates a top-1 accuracy of 82.4% and a top-3 accuracy of 94.8% for recognition of 50 finger gestures thus indicating promise. While we have only scratched the surface, we outline a number of interesting possibilities for extending this work in the cross-disciplinary areas of computer vision, machine learning, and wearable IoT for enabling novel applications in finger motion tracking.

References

[1]

Dynamic time warping (dtw) algorithm for time series analysis. https://medium.com/datadriveninvestor/dynamic-time-warping-dtw-d51d1a1e4afc.

[2]

Knuckleball Grip, Part 3: Depth of the Baseball. "https://knuckleballnation.com/how-to/knuckleballgrip3/".

[3]

Leap motion developer. https://developer.leapmotion.com/.

[4]

Microsoft kinect2.0. https://developer.microsoft.com/en-us/windows/kinect.

[5]

Motiv ring | 24/7 smart ring | fitness + sleep tracking | online security. https://mymotiv.com/.

[6]

Oura ring: The most accurate sleep and activity tracker. https://ouraring.com/.

[7]

Profile battery usage with batterystats and battery historian. https://developer.android.com/topic/performance/power/setup-battery-historian.

[8]

Signing savvy - asl sign language video dictionary. https://www.signingsavvy.com/.

[9]

Wearables for motion tracking + wireless environment monitoring. https://mbientlab.com/store/adhesive-sensor-research-kit/.

[10]

Abadi, M., et al. Tensorflow: A system for large-scale machine learning. In OSDI (2016), pp. 265--283.

Digital Library

[11]

Acharya, S., et al. Towards a brain-computer interface for dexterous control of a multi-fingered prosthetic hand. In 2007 3rd International IEEE/EMBS Conference on Neural Engineering (2007), IEEE, pp. 200--203.

[12]

Agustino, R., et al. Impairment of individual finger movements in parkinson's disease. Movement disorders 18, 5 (2003), 560--565.

[13]

Ahmed, M. A., et al. A review on systems-based sensory gloves for sign language recognition state of the art between 2007 and 2017. Sensors 18, 7 (2018), 2208.

[14]

Bachlin, M., et al. Wearable assistant for parkinson's disease patients with the freezing of gait symptom. IEEE Transactions on Information Technology in Biomedicine 14, 2 (2009), 436--446.

[15]

Barbosa, P., et al. Unsupervised domain adaptation for human activity recognition. In International Conference on Intelligent Data Engineering and Automated Learning (2018), Springer, pp. 623--630.

Digital Library

[16]

Berndt, D. J., et al. Using dynamic time warping to find patterns in time series. In KDD workshop (1994), vol. 10, Seattle, WA, pp. 359--370.

Digital Library

[17]

Bertero, M., et al. The stability of inverse problems. In Inverse scattering problems in optics. Springer, 1980, pp. 161--214.

[18]

Cai, Y., et al. Weakly-supervised 3d hand pose estimation from monocular rgb images. In ECCV (2018), pp. 666--682.

[19]

Cao, Z., et al. Openpose: realtime multi-person 2d pose estimation using part affinity fields. arXiv preprint arXiv.1812.08008 (2018).

[20]

Chang, Y., et al. A systematic study of unsupervised domain adaptation for robust human-activity recognition. ACM IMWUT (2020).

[21]

Ciregan, D., et al. Multi-column deep neural networks for image classification. In CVPR (2012), IEEE, pp. 3642--3649.

Digital Library

[22]

Cordella, F., et al. Patient performance evaluation using kinect and monte carlo-based finger tracking. In IEEE BioRob (2012), IEEE, pp. 1967--1972.

[23]

Cubuk, E. D., Zoph, B., Mane, D., Vasudevan, v., and Le, Q. V. Autoaugment: Learning augmentation strategies from data. In Proceedings of the IEEE conference on computer vision and pattern recognition (2019), pp. 113--123.

[24]

Deng, J., et al. Imagenet: A large-scale hierarchical image database. In IEEE CVPR (2009), Ieee, pp. 248--255.

[25]

Devlin, J., et al. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018).

[26]

Google. Deploy machine learning models on mobile and IoT devices. "https://www.tensorflow.org/lite", 2019.

[27]

Han, X., et al. Pre-trained alexnet architecture with pyramid pooling and supervision for high spatial resolution remote sensing image scene classification. Remote Sensing 9, 8 (2017), 848.

[28]

Hesse, S., et al. A new electromechanical trainer for sensorimotor rehabilitation of paralysed fingers: a case series in chronic and acute stroke patients. Journal of neuroengineering and rehabilitation 5, 1 (2008), 21.

[29]

Huang, Y., et al. Deep inertial poser: learning to reconstruct human pose from sparse inertial measurements in real time. ACM TOG 37, 6 (2018), 1--15.

Digital Library

[30]

Ionescu, C., et al. Human3. 6m: Large scale datasets and predictive methods for 3d human sensing in natural environments. IEEE transactions on pattern analysis and machine intelligence 36, 7 (2013), 1325--1339.

[31]

Iqbal, U., et al. Hand pose estimation via latent 2.5 d heatmap regression. In ECCV (2018), pp. 118--134.

[32]

Jaitly, N., et al. Vocal tract length perturbation improves speech recognition. In Proc. ICML Workshop on Deep Learning for Audio, Speech and Language (2013).

[33]

Jiang, W., et al. Towards 3d human pose construction using wifi. In ACM MobiCom (2020), pp. 1--14.

[34]

Kanda, N., et al. Elastic spectral distortion for low resource speech recognition with deep neural networks. In 2013 IEEE Workshop on Automatic Speech Recognition and Understanding (2013).

[35]

Kim, M., et al. Golf swing segmentation from a single imu using machine learning. Sensors (2020).

[36]

Kingma, D. P., et al. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014).

[37]

Krizhevsky, A., et al. Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems (2012).

Digital Library

[38]

Kwon, H., et al. Imutube: Automatic extraction of virtual on-body accelerometry from video for human activity recognition. arXiv preprint arXiv:2006.05675 (2020).

[39]

Liu, Y., et al. Application informed motion signal processing for finger motion tracking using wearable sensors. In IEEE ICASSP 2020.

[40]

Liu, Y., et al. Finger gesture tracking for interactive applications: A pilot study with sign languages. ACM IMWUT 2020.

Digital Library

[41]

Mariakakis, A., et al. Drunk user interfaces: Determining blood alcohol level through everyday smartphone tasks. In CHI Conference on Human Factors in Computing Systems (2018).

[42]

Mueller, F., et al. Ganerated hands for real-time 3d hand tracking from monocular rgb. In IEEE CVPR (2018), pp. 49--59.

[43]

Murguialday, A. R., et al. Brain-computer interface for a prosthetic hand using local machine control and haptic feedback. In 2007 IEEE 10th International Conference on Rehabilitation Robotics (2007), IEEE, pp. 609--613.

[44]

Nawaz, W., et al. Classification of breast cancer histology images using alexnet. In International conference image analysis and recognition (2018), Springer.

[45]

Nirjon, S., et al. Typingring: A wearable ring platform for text input. In ACM MobiSys (2015).

Digital Library

[46]

Oura ring review. https://www.wareable.com/health-and-wellbeing/oura-ring-2018-review-6628, 2018.

[47]

Oura ring - what we learned about the sleep tracking ring. https://www.cnbc.com/2019/12/20/oura-ring-review---what-we-learned-about-the-sleep-tracking-ring.html, 2019.

[48]

Oura ring review - the early adopter catches the worm. https://www.androidauthority.com/oura-ring-2-review-933935/, 2019.

[49]

Parate, A., et al. Risq: Recognizing smoking gestures with inertial sensors on a wristband. In ACM MobiSys (2014), pp. 149--161.

[50]

Park, K.-B., et al. Deep learning-based smart task assistance in wearable augmented reality. Robotics and Computer-Integrated Manufacturing (2020).

[51]

Pu, Q., Gupta, S., Gollakota, S., and Patel, S. Whole-home gesture recognition using wireless signals. In Proceedings of the 19th annual international conference on Mobile computing & networking (2013), ACM, pp. 27--38.

Digital Library

[52]

Qu, C., et al. Bert with history answer embedding for conversational question answering. In ACM SIGIR (2019), pp. 1133--1136.

Digital Library

[53]

Rashid, K. M., et al. Times-series data augmentation and deep learning for construction equipment activity recognition. Advanced Engineering Informatics 42 (2019), 100944.

Digital Library

[54]

Reiss, A., et al. Introducing a new benchmarked dataset for activity monitoring. In IEEE ISWC (2012), pp. 108--109.

Digital Library

[55]

Rey, V. F., et al. Let there be imu data: generating training data for wearable, motion sensor based activity recognition from monocular rgb videos. In ACM UbiComp/ISWC (2019), pp. 699--708.

Digital Library

[56]

Roy, N., et al. I am a smartphone and i can tell my user's walking direction. In ACM MobiSys (2014).

Digital Library

[57]

Sato, I., et al. Apac: Augmented pattern classification with neural networks. arXiv preprint arXiv:1505.03229 (2015).

[58]

Sen, S., et al. Annapurna: An automated smartwatch-based eating detection and food journaling system. Pervasive and Mobile Computing (2020), 101259.

[59]

Shaeffer, D. K. Mems inertial sensors: A tutorial overview. IEEE Communications Magazine 51, 4 (2013), 100--109.

[60]

Shen, S., Wang, H., and Roy Choudhury, R. I am a smartwatch and i can track my user's arm. In Proceedings of the 14th annual international conference on Mobile systems, applications, and services (2016), ACM, pp. 85--96.

Digital Library

[61]

Simard, P. Y., et al. Best practices for convolutional neural networks applied to visual document analysis. In Icdar (2003), vol. 3.

[62]

Socher, R., et al. Zero-shot learning through cross-modal transfer. In NeurIPS (2013), pp. 935--943.

Digital Library

[63]

Google project soli. https://atap.google.com/soli/, 2020.

[64]

Sun, B., et al. Deep coral: Correlation alignment for deep domain adaptation. In ECCV (2016), Springer, pp. 443--450.

[65]

Susanto, E. A., et al. Efficacy of robot-assisted fingers training in chronic stroke survivors: a pilot randomized-controlled trial. Journal of neuroengineering and rehabilitation 12, 1 (2015), 42.

[66]

Takeda, S., et al. A multi-sensor setting activity recognition simulation tool. In ACM UbiComp (2018), pp. 1444--1448.

Digital Library

[67]

Tzeng, E., et al. Adversarial discriminative domain adaptation. In IEEE CVPR (2017), pp. 7167--7176.

[68]

Um, T. T., et al. Data augmentation of wearable sensor data for parkinson's disease monitoring using convolutional neural networks. In ACM ICMI (2017).

Digital Library

[69]

Vicon - award winning motion capture system. https://www.vicon.com/, 2020.

[70]

Wager, S., et al. Dropout training as adaptive regularization. In Advances in neural information processing systems (2013), pp. 351--359.

Digital Library

[71]

Wan, L., et al. Regularization of neural networks using dropconnect. In ICML (2013), pp. 1058--1066.

Digital Library

[72]

Wang, H., et al. No need to war-drive: Unsupervised indoor localization. In ACM MobiSys (2012), pp. 197--210.

Digital Library

[73]

Xiang, D., et al. Monocular total capture: Posing face, body, and hands in the wild. In IEEE CVPR (2019).

[74]

Xiao, F., et al. A deep learning method for complex human activity recognition using virtual wearable sensors. arXiv preprint arXiv:2003.01874 (2020).

[75]

Young, A. D., et al. Imusim: A simulation environment for inertial sensing algorithm design and evaluation. In ACM/IEEE IPSN (2011).

[76]

Zhang, H., et al. Pretraining-based natural language generation for text summarization. arXiv preprint arXiv:1902.09243 (2019).

[77]

Zhang, H., et al. Pdlens: smartphone knows drug effectiveness among parkinson's via daily-life activity fusion. In ACM MobiCom (2020).

Digital Library

[78]

Zhang, S., et al. Deep generative cross-modal on-body accelerometer data synthesis from videos. In ACM UbiComp/ISWC (2020), pp. 223--227.

Digital Library

[79]

Zhao, M., et al. Emotion recognition using wireless signals. In ACM MobiCom (2016), pp. 95--108.

Digital Library

[80]

Zhao, M., et al. Learning sleep stages from radio signals: A conditional adversarial architecture. In ICML (2017), pp. 4100--4109.

[81]

Zhao, M., et al. Through-wall human mesh recovery using radio signals. In IEEE ICCV (2019), pp. 10113--10122.

[82]

Zhou, P., et al. Use it free: Instantly knowing your phone attitude. In ACM MobiCom (2014), pp. 605--616.

Digital Library

[83]

Zhou, Z., et al. Fine-tuning convolutional neural networks for biomedical image analysis: actively and incrementally. In IEEE CVPR (2017), pp. 7340--7351.

Cited By

Leng ZBhattacharjee ARajasekhar HZhang LBruda EKwon HPlötz T(2024)IMUGPT 2.0: Language-Based Cross Modality Transfer for Sensor-Based Human Activity RecognitionProceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies10.1145/36785458:3(1-32)Online publication date: 9-Sep-2024
https://dl.acm.org/doi/10.1145/3678545
Fortes Rey VRay LXia QWu KLukowicz PKostakos VKay JHoang T(2024)Enhancing Inertial Hand based HAR through Joint Representation of Language, Pose and Synthetic IMUsProceedings of the 2024 ACM International Symposium on Wearable Computers10.1145/3675095.3676609(25-31)Online publication date: 5-Oct-2024
https://dl.acm.org/doi/10.1145/3675095.3676609
Chen YYang KAn ZHolder BPaloutzian LBali KDu WBaeza-Yates RBonchi F(2024)MARLP: Time-series Forecasting Control for Agricultural Managed Aquifer RechargeProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3637528.3671533(4862-4872)Online publication date: 25-Aug-2024
https://dl.acm.org/doi/10.1145/3637528.3671533
Show More Cited By

Index Terms

When Video meets Inertial Sensors: Zero-shot Domain Adaptation for Finger Motion Analytics with Inertial Sensors
1. Computing methodologies
  1. Machine learning
    1. Machine learning approaches
      1. Neural networks
2. Human-centered computing
  1. Ubiquitous and mobile computing
    1. Ubiquitous and mobile computing design and evaluation methods
    2. Ubiquitous and mobile devices
      1. Mobile devices

Recommendations

Using wearable inertial sensors to detect different strategies for the sit-to-stand transition in multiple sclerosis
ECCE '21: Proceedings of the 32nd European Conference on Cognitive Ergonomics

INTRODUCTION: The sit-to-stand (Si-St) transition is an essential activity of daily living (ADL) which is fundamental to maintaining functional independence. It can often be compromised in patients with neurological disorders such as multiple sclerosis (...
Whole hand modeling using 8 wearable sensors: biomechanics for hand pose prediction
AH '13: Proceedings of the 4th Augmented Human International Conference

Although Data Gloves allow for the modeling of the human hand, they can lead to a reduction in usability as they cover the entire hand and limit the sense of touch as well as reducing hand feasibility. As modeling the whole hand has many advantages (...
Inertial Measurement Units in Gait Analysis Applications
BIOSTEC 2015: Proceedings of the International Joint Conference on Biomedical Engineering Systems and Technologies - Volume 4

The paper deals with inertial measurement units (IMU) and their application in gait analysis in the wide range from movement monitoring through rehabilitation feedback to sports improvement. An IMU sensor incorporates three microelectromechanical ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

IoTDI '21: Proceedings of the International Conference on Internet-of-Things Design and Implementation

May 2021

288 pages

ISBN:9781450383547

DOI:10.1145/3450268

Copyright © 2021 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 18 May 2021

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Funding Sources

National Science Foundation

Conference

IoTDI '21

Sponsor:

SIGBED

IoTDI '21: International Conference on Internet-of-Things Design and Implementation

May 18 - 21, 2021

VA, Charlottesvle, USA

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

26
Total Citations
View Citations
1,038
Total Downloads

Downloads (Last 12 months)273
Downloads (Last 6 weeks)38

Reflects downloads up to 03 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Leng ZBhattacharjee ARajasekhar HZhang LBruda EKwon HPlötz T(2024)IMUGPT 2.0: Language-Based Cross Modality Transfer for Sensor-Based Human Activity RecognitionProceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies10.1145/36785458:3(1-32)Online publication date: 9-Sep-2024
https://dl.acm.org/doi/10.1145/3678545
Fortes Rey VRay LXia QWu KLukowicz PKostakos VKay JHoang T(2024)Enhancing Inertial Hand based HAR through Joint Representation of Language, Pose and Synthetic IMUsProceedings of the 2024 ACM International Symposium on Wearable Computers10.1145/3675095.3676609(25-31)Online publication date: 5-Oct-2024
https://dl.acm.org/doi/10.1145/3675095.3676609
Chen YYang KAn ZHolder BPaloutzian LBali KDu WBaeza-Yates RBonchi F(2024)MARLP: Time-series Forecasting Control for Agricultural Managed Aquifer RechargeProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3637528.3671533(4862-4872)Online publication date: 25-Aug-2024
https://dl.acm.org/doi/10.1145/3637528.3671533
Li JXu JLiu YXu WLi Z(2024)Enhancing the Applicability of Sign Language TranslationIEEE Transactions on Mobile Computing10.1109/TMC.2024.335011123:9(8634-8648)Online publication date: Sep-2024
https://doi.org/10.1109/TMC.2024.3350111
Zhang XWang KLi ZZhang J(2024)Poster Abstract: Enhancing Human Motion Sensing with synthesized Millimeter-Waves2024 23rd ACM/IEEE International Conference on Information Processing in Sensor Networks (IPSN)10.1109/IPSN61024.2024.00054(307-308)Online publication date: 13-May-2024
https://doi.org/10.1109/IPSN61024.2024.00054
Zhang SLu TZhou HLiu YLiu RGowda M(2023)I Am an Earphone and I Can Hear My User’s Face: Facial Landmark Tracking Using Smart EarphonesACM Transactions on Internet of Things10.1145/36144385:1(1-29)Online publication date: 16-Dec-2023
https://dl.acm.org/doi/10.1145/3614438
Li JHuang LShah SJones SJin YWang DRussell AChoi SGao YYuan JJin Z(2023)SignRingProceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies10.1145/36108817:3(1-29)Online publication date: 27-Sep-2023
https://dl.acm.org/doi/10.1145/3610881
Santhalingam PPathak PRangwala HKosecka J(2023)Synthetic Smartwatch IMU Data Generation from In-the-wild ASL VideosProceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies10.1145/35962617:2(1-34)Online publication date: 12-Jun-2023
https://dl.acm.org/doi/10.1145/3596261
Jin YZhang SGao YXu XChoi SLi ZAdler HJin Z(2023)SmartASLProceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies10.1145/35962557:2(1-21)Online publication date: 12-Jun-2023
https://dl.acm.org/doi/10.1145/3596255
Monjur MLuo YWang ZNirjon SHui PAmiri Sani ANurmi PLiu Y(2023)SoundSieve: Seconds-Long Audio Event Recognition on Intermittently-Powered SystemsProceedings of the 21st Annual International Conference on Mobile Systems, Applications and Services10.1145/3581791.3596859(28-41)Online publication date: 18-Jun-2023
https://dl.acm.org/doi/10.1145/3581791.3596859
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Figures

Tables

Media

View Table of Conten