Article

Enhancing Gait Video Analysis in Neurodegenerative Diseases by Knowledge Augmentation in Vision Language Model

Authors:

Candice Muller,

Frédéric Blanc,

Hyewon SeoAuthors Info & Claims

Medical Image Computing and Computer Assisted Intervention – MICCAI 2024: 27th International Conference, Marrakesh, Morocco, October 6–10, 2024, Proceedings, Part V

Pages 251 - 261

https://doi.org/10.1007/978-3-031-72086-4_24

Published: 07 October 2024 Publication History

Abstract

We present a knowledge augmentation strategy for assessing the diagnostic groups and gait impairment from monocular gait videos. Based on a large-scale pre-trained Vision Language Model (VLM), our model learns and improves visual, textual, and numerical representations of patient gait videos, through a collective learning across three distinct modalities: gait videos, class-specific descriptions, and numerical gait parameters. Our specific contributions are two-fold: First, we adopt a knowledge-aware prompt tuning strategy to utilize the class-specific medical description in guiding the text prompt learning. Second, we integrate the paired gait parameters in the form of numerical texts to enhance the numeracy of the textual representation. Results demonstrate that our model not only significantly outperforms state-of-the-art methods in video-based classification tasks but also adeptly decodes the learned class-specific text features into natural language descriptions using the vocabulary of quantitative gait parameters. The code and the model will be made available at our project page: https://lisqzqng.github.io/GaitAnalysisVLM/.

References

[1]

Achiam, J., Adler, S., Agarwal, S., Ahmad, L., Akkaya, I., Aleman, F.L., Almeida, D., Altenschmidt, J., Altman, S., Anadkat, S., et al.: Gpt-4 technical report. arXiv preprint arXiv:2303.08774 (2023)

[2]

Albuquerque P, Verlekar TT, Correia PL, and Soares LD A spatiotemporal deep learning approach for automatic pathological gait classification Sensors 2021 21 18 6202

[3]

Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: Conference on Neural Information Processing Systems (NeurIPS) (2020)

[4]

Fan, C., Liang, J., Shen, C., Hou, S., Huang, Y., Yu, S.: Opengait: Revisiting gait recognition towards better practicality. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 9707–9716 (2023)

[5]

Friji, R., Drira, H., Chaieb, F., Kchok, H., Kurtek, S.: Geometric deep neural network using rigid and non-rigid transformations for human action recognition. In: Proceedings of the IEEE/CVF international conference on computer vision. pp. 12611–12620 (2021)

[6]

Goetz CG, Tilley BC, Shaftman SR, Stebbins GT, Fahn S, Martinez-Martin P, Poewe W, Sampaio C, Stern MB, Dodel R, et al. Movement disorder society-sponsored revision of the unified parkinson’s disease rating scale (mds-updrs): scale presentation and clinimetric testing results Movement disorders: official journal of the Movement Disorder Society 2008 23 15 2129-2170

[7]

Golkar, S., Pettee, M., Eickenberg, M., Bietti, A., Cranmer, M., Krawezik, G., Lanusse, F., McCabe, M., Ohana, R., Parker, L., et al.: xval: A continuous number encoding for large language models. arXiv preprint arXiv:2310.02989 (2023)

[8]

Huang, S.C., Shen, L., Lungren, M.P., Yeung, S.: Gloria: A multimodal global-local representation learning framework for label-efficient medical image recognition. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 3942–3951 (2021)

[9]

Huang, Z., Bianchi, F., Yuksekgonul, M., Montine, T.J., Zou, J.: A visual–language foundation model for pathology image analysis using medical twitter. Nature medicine 29(9), 2307–2316 (2023)

[10]

Kan, B., Wang, T., Lu, W., Zhen, X., Guan, W., Zheng, F.: Knowledge-aware prompt tuning for generalizable vision-language models. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 15670–15680 (2023)

[11]

Kocabas, M., Athanasiou, N., Black, M.J.: Vibe: Video inference for human body pose and shape estimation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 5253–5263 (2020)

[12]

Kocabas, M., Huang, C.H.P., Hilliges, O., Black, M.J.: Pare: Part attention regressor for 3d human body estimation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 11127–11137 (2021)

[13]

Li, W., Zhu, L., Wen, L., Yang, Y.: Decap: Decoding clip latents for zero-shot captioning via text-only training. arXiv preprint arXiv:2303.03032 (2023)

[14]

Lu, M., Poston, K., Pfefferbaum, A., Sullivan, E.V., Fei-Fei, L., Pohl, K.M., Niebles, J.C., Adeli, E.: Vision-based estimation of mds-updrs gait scores for assessing parkinson’s disease motor severity. In: International Conference on Medical Image Computing and Computer-Assisted Intervention. pp. 637–647. Springer (2020)

[15]

Mc Ardle R, Del Din S, Donaghy P, Galna B, Thomas AJ, and Rochester L The impact of environment on gait assessment: considerations from real-world gait analysis in dementia subtypes Sensors 2021 21 3 813

[16]

Mc Ardle R, Del Din S, Galna B, Thomas A, and Rochester L Differentiating dementia disease subtypes with gait analysis: feasibility of wearable sensors? Gait & posture 2020 76 372-376

[17]

Mc Ardle R, Galna B, Donaghy P, Thomas A, and Rochester L Do alzheimer’s and lewy body disease have discrete pathological signatures of gait? Alzheimer’s & Dementia 2019 15 10 1367-1377

[18]

Mehdizadeh S, Nabavi H, Sabo A, Arora T, Iaboni A, and Taati B The toronto older adults gait archive: video and 3d inertial motion capture data of older adults’ walking Scientific data 2022 9 1 398

[19]

Merory, J., Wittwer, J., Rowe, C., Webster, K.: Quantitative gait analysis in patients with dementia with lewy bodies and alzheimer’s disease. Gait & posture 26, 414–9 (10 2007). 10.1016/j.gaitpost.2006.10.006

[20]

Miech, A., Alayrac, J.B., Smaira, L., Laptev, I., Sivic, J., Zisserman, A.: End-to-end learning of visual representations from uncurated instructional videos. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 9879–9889 (2020)

[21]

Muller C, Perisse J, Blanc F, Kiesmann M, Astier C, and Vogel T Corrélation des troubles de la marche au profil neuropsychologique chez les patients atteints de maladie d’alzheimer et maladie à corps de lewy Revue Neurologique 2018 174 S2-S3

[22]

Qin, Z., Yi, H., Lao, Q., Li, K.: Medical image understanding with pretrained vision language models: A comprehensive study. arXiv preprint arXiv:2209.15517 (2022)

[23]

Radford, A., Kim, J.W., Hallacy, C., Ramesh, A., Goh, G., Agarwal, S., Sastry, G., Askell, A., Mishkin, P., Clark, J., et al.: Learning transferable visual models from natural language supervision. In: International conference on machine learning. pp. 8748–8763. PMLR (2021)

[24]

Sabo A, Mehdizadeh S, Iaboni A, and Taati B Estimating parkinsonism severity in natural gait videos of older adults with dementia IEEE journal of biomedical and health informatics 2022 26 5 2288-2298

[25]

Wang D, Zouaoui C, Jang J, Drira H, and Seo H Wu S, Shabestari B, and Xing L Video-based gait analysis for assessing alzheimer’s disease and dementia with lewy bodies Applications of Medical Artificial Intelligence 2024 Cham Springer Nature Switzerland 72-82

Digital Library

[26]

Wang X, Gao T, Zhu Z, Zhang Z, Liu Z, Li J, and Tang J Kepler: A unified model for knowledge embedding and pre-trained language representation Transactions of the Association for Computational Linguistics 2021 9 176-194

[27]

Wang, Z., Wu, Z., Agarwal, D., Sun, J.: Medclip: Contrastive learning from unpaired medical images and text. In: Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing. pp. 3876–3887 (2022)

[28]

Wasim, S.T., Naseer, M., Khan, S., Khan, F.S., Shah, M.: Vita-clip: Video and text adaptive clip via multimodal prompting. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 23034–23044 (2023)

[29]

Yuan, K., Srivastav, V., Yu, T., Lavanchy, J., Mascagni, P., Navab, N., Padoy, N.: Learning multi-modal representations by watching hundreds of surgical video lectures. arXiv preprint arXiv:2307.15220 (2023)

Index Terms

Enhancing Gait Video Analysis in Neurodegenerative Diseases by Knowledge Augmentation in Vision Language Model
1. Applied computing
  1. Life and medical sciences
    1. Health care information systems
    2. Health informatics
2. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision tasks
        Activity recognition and understanding
        Video summarization
  2. Machine learning

Index terms have been assigned to the content through auto-classification.

Recommendations

Video-Based Gait Analysis for Spinal Deformity
Computer Vision – ECCV 2022 Workshops
Abstract
In this paper, we explore the area of classifying spinal deformities unintrusively using machine learning and RGB cameras. We postulate that any changes to posture due to spinal deformity can induce specific changes in people’s gait. These changes ...
Classification of neurodegenerative diseases using gait dynamics via deterministic learning

We present a new method to classify neurodegenerative diseases via deterministic learning theory.The gait system dynamics can be learned by using RBF neural networks.The neurodegenerative diseases can be classified according to the smallest error ...
Gait enhancing mobile shoe (GEMS) for rehabilitation
WHC '09: Proceedings of the World Haptics 2009 - Third Joint EuroHaptics conference and Symposium on Haptic Interfaces for Virtual Environment and Teleoperator Systems

Individuals with certain types of central nervous system damage, such as stroke, have an asymmetric walking gait. Using a split-belt treadmill, where each leg walks at a different speed, has been shown to help rehabilitate walking impaired individuals, ...

Comments

Information & Contributors

Information

Published In

cover image Guide Proceedings

Medical Image Computing and Computer Assisted Intervention – MICCAI 2024: 27th International Conference, Marrakesh, Morocco, October 6–10, 2024, Proceedings, Part V

Oct 2024

814 pages

ISBN:978-3-031-72085-7

DOI:10.1007/978-3-031-72086-4

Editors:
Marius George Linguraru
Children’s National Hospital/George Washington University, Washington, DC, USA
,
Qi Dou
The Chinese University of Hong Kong, Hong Kong, China
,
Aasa Feragen
Technical University of Denmark, Kgs Lyngby, Denmark
,
Stamatia Giannarou
https://ror.org/041kmwe10Imperial College London, London, UK
,
Ben Glocker
Imperial College London, London, UK
,
Karim Lekadir
Universitat de Barcelona, Barcelona, Spain
,
Julia A. Schnabel
Helmholtz Munich, Technical University of Munich and King’s College London, Munich, Germany

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024.

Publisher

Springer-Verlag

Berlin, Heidelberg

Publication History

Published: 07 October 2024

Author Tags

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 30 Jan 2025

Other Metrics

View Author Metrics

Citations

View Options

View options

Figures

Tables

Media

View Table of Conten